By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Sony - PS4 GPU has 128 'Hidden' Stream Processors

I know... I know... people will say these are for redundancy but..

PS4's APU have 20 CUs intead 18 CUs... Sony for now choose to disable two of them.

It is unlikely that Sony will enable these two CUs in future but it is possible using a simple firmware update.

http://wccftech.com/ps4-apu-apparently-128-hidden-stream-processors-10-power/



Around the Network

These 2 CUs are disabled because of printing mistakes therefore it will never be enabled due to the fact that those units are non functional.



fatslob-:O said:
These 2 CUs are disabled because of printing mistakes therefore it will never be enabled due to the fact that those units are non functional.


Really, how does that even work?



My grammar errors are justified by the fact that I am a brazilian living in Brazil. I am also very stupid.

WagnerPaiva said:

Really, how does that even work?

They increase the % of the good chips per wafer because they can use chips with 2 bad CUs.

It is like.

+ 50% of the chips have all the CUs good.
+ 25% of the chips have 19 CUs good.
+ 10% of the chips have 18 CUs good.
- 15% or the chips have less than 18 CUs good.

So disabling two CUs they can use up to 85% of the chips in my example... if they use just the 100% good chips they will use only 50% of the chips.

I made my numbers but is is like that but if the production is really good they can have chips only with 19 or 20 CUs good... so they can enable them in a late patch.



fatslob-:O said:
These 2 CUs are disabled because of printing mistakes therefore it will never be enabled due to the fact that those units are non functional.

Slight consusion here. When you process wafers into chips, there are two factors to account for:

Every wafer a fab processes has intrinsic mistakes. Although it is supposed to be a single silica crystal with a maximum specified impurity, the reality is more complex. During manufacturing, you have additional problems like laser fluctuations, too much/not enough etching, cosmic rays, chemicals not behaving like they should. All in all, you have a certain chance that parts of your wafer are bad. Usually all is told with a single number: The defect rate. This number tells you how many defects per 100mm^2 you will encounter on average (of course we disregard bad/faulty design here right grom the start).

Let's assume that number is 0.2 for the PS4's apu (the number is totally secret and nobody is ever going to tell you its actual value for any factory. But 0.2 is reasonable without going into details why).  The PS4's apu die size is roughly 320mm'2, so per die you have a chance of 0.2*320*100% = 64% that there is a fault in your die. This means that roughly two thirds of your wavers will produce nothing and only a third of the chips will work (at best). That is of course unacceptable because if you pay $5000 for a waver processing and you only get 10-15 working chips, you can figure out yourself the problem.

The solution found by engineers is simple (and can be very complex to implement at the same time). Engineers put more stuff into the die that what is really necessary. Then if one thing turns out to be bad, you simply replace it with a surplus thing. You don't put too much of everything into your die. You could put more cores, more chache, more CUs, more drivers into your die and replace anything that is bad with a corresponding surplus. In the end you would get 100% of your chips working. However, if you add too much reserve stuff into your die, it gets way too big and way too complex to manage, so while you have 100% yield, you only get half the chips per wafer.

Let's look at the PS4's die and where we could add redundancy. The obvious choice is the 18 CU units in the gpu. This is the largest block in the die, roughly taking 33% of the entire die, So it is the most likely place a random fault will be located. It is also very easy to add two spare CUs because it is mostly a cut-and-paste operation. With this simple increase, we just saved at least 33% of all bad chips. The next obvious place to add spares are those "chessboard areas" at the Jaguar cores. These are second level caches, and memory is rather easy to add as spare parts. Unfortunately, at this time, we are already coming to an end to adding reasonable spare parts. Adding spare Jaguar cores is not a realistic option, also the memory controllers are rather large and there is no place for a spare (I have no idea at all how redundant gddr5 controllers can be designed). There may be individual cache areas in various parts that have "spares" built in. All in all, probably 60% of the die area is "saved by spares".

One point should be made clear: If you have bad parts or unused reserve parts in your chip, you must make sure that everything that is bad or unused is electronically disabled. Any transistor in a chip that is "free to do whatever it wants" will kill the chip sonner than later. Hence if the PS4 apu promises 18CUs, the surplus 2 CUs (whether they are working or replacement units for 1 or 2 defective regular CUs MUST be disabled at the end of the manufacturing line. How that is done (permanently or unlockanle) is up to the designer.



Around the Network
Kyuu said:
Cell on PS3 also had one of its SPU's disabled yet they never enabled it.

It's true, they where afraid that the power of the Cell could destroy the fabric of our dimension.



"I've Underestimated the Horse Power from Mario Kart 8, I'll Never Doubt the WiiU's Engine Again"

Wow I feel so dumb I don't understand anything that is said in this thread. hahahaha



This is the norm. The 360 GPU did the same think to improve yields.



Better not use them, the console can melt of such huge amount of heat generated. PS4s are already turning themselves off because of heat. 



drkohler said:
fatslob-:O said:
These 2 CUs are disabled because of printing mistakes therefore it will never be enabled due to the fact that those units are non functional.

Slight consusion here. When you process wafers into chips, there are two factors to account for:

Every wafer a fab processes has intrinsic mistakes. Although it is supposed to be a single silica crystal with a maximum specified impurity, the reality is more complex. During manufacturing, you have additional problems like laser fluctuations, too much/not enough etching, cosmic rays, chemicals not behaving like they should. All in all, you have a certain chance that parts of your wafer are bad. Usually all is told with a single number: The defect rate. This number tells you how many defects per 100mm^2 you will encounter on average (of course we disregard bad/faulty design here right grom the start).

Let's assume that number is 0.2 for the PS4's apu (the number is totally secret and nobody is ever going to tell you its actual value for any factory. But 0.2 is reasonable without going into details why).  The PS4's apu die size is roughly 320mm'2, so per die you have a chance of 0.2*320*100% = 64% that there is a fault in your die. This means that roughly two thirds of your wavers will produce nothing and only a third of the chips will work (at best). That is of course unacceptable because if you pay $5000 for a waver processing and you only get 10-15 working chips, you can figure out yourself the problem.

The solution found by engineers is simple (and can be very complex to implement at the same time). Engineers put more stuff into the die that what is really necessary. Then if one thing turns out to be bad, you simply replace it with a surplus thing. You don't put too much of everything into your die. You could put more cores, more chache, more CUs, more drivers into your die and replace anything that is bad with a corresponding surplus. In the end you would get 100% of your chips working. However, if you add too much reserve stuff into your die, it gets way too big and way too complex to manage, so while you have 100% yield, you only get half the chips per wafer.

Let's look at the PS4's die and where we could add redundancy. The obvious choice is the 18 CU units in the gpu. This is the largest block in the die, roughly taking 33% of the entire die, So it is the most likely place a random fault will be located. It is also very easy to add two spare CUs because it is mostly a cut-and-paste operation. With this simple increase, we just saved at least 33% of all bad chips. The next obvious place to add spares are those "chessboard areas" at the Jaguar cores. These are second level caches, and memory is rather easy to add as spare parts. Unfortunately, at this time, we are already coming to an end to adding reasonable spare parts. Adding spare Jaguar cores is not a realistic option, also the memory controllers are rather large and there is no place for a spare (I have no idea at all how redundant gddr5 controllers can be designed). There may be individual cache areas in various parts that have "spares" built in. All in all, probably 60% of the die area is "saved by spares".

One point should be made clear: If you have bad parts or unused reserve parts in your chip, you must make sure that everything that is bad or unused is electronically disabled. Any transistor in a chip that is "free to do whatever it wants" will kill the chip sonner than later. Hence if the PS4 apu promises 18CUs, the surplus 2 CUs (whether they are working or replacement units for 1 or 2 defective regular CUs MUST be disabled at the end of the manufacturing line. How that is done (permanently or unlockanle) is up to the designer.

Agreed but I do think the 2 CUs were used to improve yield rates and fight off the defect rates.