By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Sony - PS4 GPU has 128 'Hidden' Stream Processors

ethomaz said:

WagnerPaiva said:

Really, how does that even work?

They increase the % of the good chips per wafer because they can use chips with 2 bad CUs.

It is like.

+ 50% of the chips have all the CUs good.
+ 25% of the chips have 19 CUs good.
+ 10% of the chips have 18 CUs good.
- 15% or the chips have less than 18 CUs good.

So disabling two CUs they can use up to 85% of the chips in my example... if they use just the 100% good chips they will use only 50% of the chips.

I made my numbers but is is like that but if the production is really good they can have chips only with 19 or 20 CUs good... so they can enable them in a late patch.


Crazy stuff. Hey, a store in Atibaia-SP is offering me a PS4 for R$ 2799 with a free game, they have in stock. What do you think?



My grammar errors are justified by the fact that I am a brazilian living in Brazil. I am also very stupid.

Around the Network
Kyuu said:
Anfebious said:
Kyuu said:
Cell on PS3 also had one of its SPU's disabled yet they never enabled it.

It's true, they where afraid that the power of the Cell could destroy the fabric of our dimension.


It's true though. It was disabled it to increase production yield.

The world wasn't ready, just like the world isn't ready for the Ouya .



"I've Underestimated the Horse Power from Mario Kart 8, I'll Never Doubt the WiiU's Engine Again"

drkohler said:

Slight consusion here. When you process wafers into chips, there are two factors to account for:

Every wafer a fab processes has intrinsic mistakes. Although it is supposed to be a single silica crystal with a maximum specified impurity, the reality is more complex. During manufacturing, you have additional problems like laser fluctuations, too much/not enough etching, cosmic rays, chemicals not behaving like they should. All in all, you have a certain chance that parts of your wafer are bad. Usually all is told with a single number: The defect rate. This number tells you how many defects per 100mm^2 you will encounter on average (of course we disregard bad/faulty design here right grom the start).

Let's assume that number is 0.2 for the PS4's apu (the number is totally secret and nobody is ever going to tell you its actual value for any factory. But 0.2 is reasonable without going into details why).  The PS4's apu die size is roughly 320mm'2, so per die you have a chance of 0.2*320*100% = 64% that there is a fault in your die. This means that roughly two thirds of your wavers will produce nothing and only a third of the chips will work (at best). That is of course unacceptable because if you pay $5000 for a waver processing and you only get 10-15 working chips, you can figure out yourself the problem.

The solution found by engineers is simple (and can be very complex to implement at the same time). Engineers put more stuff into the die that what is really necessary. Then if one thing turns out to be bad, you simply replace it with a surplus thing. You don't put too much of everything into your die. You could put more cores, more chache, more CUs, more drivers into your die and replace anything that is bad with a corresponding surplus. In the end you would get 100% of your chips working. However, if you add too much reserve stuff into your die, it gets way too big and way too complex to manage, so while you have 100% yield, you only get half the chips per wafer.

Let's look at the PS4's die and where we could add redundancy. The obvious choice is the 18 CU units in the gpu. This is the largest block in the die, roughly taking 33% of the entire die, So it is the most likely place a random fault will be located. It is also very easy to add two spare CUs because it is mostly a cut-and-paste operation. With this simple increase, we just saved at least 33% of all bad chips. The next obvious place to add spares are those "chessboard areas" at the Jaguar cores. These are second level caches, and memory is rather easy to add as spare parts. Unfortunately, at this time, we are already coming to an end to adding reasonable spare parts. Adding spare Jaguar cores is not a realistic option, also the memory controllers are rather large and there is no place for a spare (I have no idea at all how redundant gddr5 controllers can be designed). There may be individual cache areas in various parts that have "spares" built in. All in all, probably 60% of the die area is "saved by spares".

One point should be made clear: If you have bad parts or unused reserve parts in your chip, you must make sure that everything that is bad or unused is electronically disabled. Any transistor in a chip that is "free to do whatever it wants" will kill the chip sonner than later. Hence if the PS4 apu promises 18CUs, the surplus 2 CUs (whether they are working or replacement units for 1 or 2 defective regular CUs MUST be disabled at the end of the manufacturing line. How that is done (permanently or unlockanle) is up to the designer.


Pretty much right on the money.

It's essentially for redundancy at the manufactuing level.
The Xbox One will more than likely have the same.
PC GPU's and CPU's have the same.
However in the PC space they can still "die harvest" and sell off versions that don't have all the working Cores/CU's cheaper, no such luxury in the console space unfortunatly.




www.youtube.com/@Pemalite

Pemalite said:
drkohler said:

Slight consusion here. When you process wafers into chips, there are two factors to account for:

Every wafer a fab processes has intrinsic mistakes. Although it is supposed to be a single silica crystal with a maximum specified impurity, the reality is more complex. During manufacturing, you have additional problems like laser fluctuations, too much/not enough etching, cosmic rays, chemicals not behaving like they should. All in all, you have a certain chance that parts of your wafer are bad. Usually all is told with a single number: The defect rate. This number tells you how many defects per 100mm^2 you will encounter on average (of course we disregard bad/faulty design here right grom the start).

Let's assume that number is 0.2 for the PS4's apu (the number is totally secret and nobody is ever going to tell you its actual value for any factory. But 0.2 is reasonable without going into details why).  The PS4's apu die size is roughly 320mm'2, so per die you have a chance of 0.2*320*100% = 64% that there is a fault in your die. This means that roughly two thirds of your wavers will produce nothing and only a third of the chips will work (at best). That is of course unacceptable because if you pay $5000 for a waver processing and you only get 10-15 working chips, you can figure out yourself the problem.

The solution found by engineers is simple (and can be very complex to implement at the same time). Engineers put more stuff into the die that what is really necessary. Then if one thing turns out to be bad, you simply replace it with a surplus thing. You don't put too much of everything into your die. You could put more cores, more chache, more CUs, more drivers into your die and replace anything that is bad with a corresponding surplus. In the end you would get 100% of your chips working. However, if you add too much reserve stuff into your die, it gets way too big and way too complex to manage, so while you have 100% yield, you only get half the chips per wafer.

Let's look at the PS4's die and where we could add redundancy. The obvious choice is the 18 CU units in the gpu. This is the largest block in the die, roughly taking 33% of the entire die, So it is the most likely place a random fault will be located. It is also very easy to add two spare CUs because it is mostly a cut-and-paste operation. With this simple increase, we just saved at least 33% of all bad chips. The next obvious place to add spares are those "chessboard areas" at the Jaguar cores. These are second level caches, and memory is rather easy to add as spare parts. Unfortunately, at this time, we are already coming to an end to adding reasonable spare parts. Adding spare Jaguar cores is not a realistic option, also the memory controllers are rather large and there is no place for a spare (I have no idea at all how redundant gddr5 controllers can be designed). There may be individual cache areas in various parts that have "spares" built in. All in all, probably 60% of the die area is "saved by spares".

One point should be made clear: If you have bad parts or unused reserve parts in your chip, you must make sure that everything that is bad or unused is electronically disabled. Any transistor in a chip that is "free to do whatever it wants" will kill the chip sonner than later. Hence if the PS4 apu promises 18CUs, the surplus 2 CUs (whether they are working or replacement units for 1 or 2 defective regular CUs MUST be disabled at the end of the manufacturing line. How that is done (permanently or unlockanle) is up to the designer.


Pretty much right on the money.

It's essentially for redundancy at the manufactuing level.
The Xbox One will more than likely have the same.
PC GPU's and CPU's have the same.
However in the PC space they can still "die harvest" and sell off versions that don't have all the working Cores/CU's cheaper, no such luxury in the console space unfortunatly.

For some strange reason I can't fathom why console manufacturers just don't put up a more powerful version of the same console. We could have a more powerful version of a PS4 with doubled the CPU clocks and doubled the functional units for the GPU and in the end it would play games at a higher framerate/resolution. The same goes for the X1. All of this can be done by just manufacturing a bigger chip too even though most consumers probably won't agree with the $700 price tag. 



WagnerPaiva said:

Crazy stuff. Hey, a store in Atibaia-SP is offering me a PS4 for R$ 2799 with a free game, they have in stock. What do you think?

In my opinion expensive... Importing from Amazon with all taxes is less than 2000... so 2800 is not a fair price but it is better than 4000 (dammit Sony Brasil).



Around the Network
ethomaz said:

WagnerPaiva said:

Crazy stuff. Hey, a store in Atibaia-SP is offering me a PS4 for R$ 2799 with a free game, they have in stock. What do you think?

In my opinion expensive... Importing from Amazon with all taxes is less than 2000... so 2800 is not a fair price but it is better than 4000 (dammit Sony Brasil).


Is it garanteed that will be no extra fees from amazon? Anyways, it is sold out there... I will wait then. Let me know if you have some news on this subject.



My grammar errors are justified by the fact that I am a brazilian living in Brazil. I am also very stupid.

Thats a perfectly normal thing. Xbox one has also 128 hidden Stream processors. Its simply for yield purposes. So they can use the wafer with less active CUs too. There will never be a firmware update enabling the additional CUs. Because millions and millions PS4s will have just 19 or 18 CUs active and not 20. Sony can not activate the CUs its pointless to speculate about, same as the additional SPE can never be activated because alot of PS3s do not have a working additional SPE.

Xbox one has also 14 CUs and not 12 yet they will never activate the other CUs. Ps4s are not Graphic cards where with alot of luck you could activate the working CUs with a firmware update if your chip has them. Because every PS4 has to have the same power in order for the games to run identical.

 

Edit: Seems my post was pointless as other posters mentioned it before me should have read the second site of this thread too before posting...   



fatslob-:O said:

We could have a more powerful version of a PS4 with doubled the CPU clocks and doubled the functional units for the GPU and in the end it would play games at a higher framerate/resolution. The same goes for the X1. All of this can be done by just manufacturing a bigger chip too even though most consumers probably won't agree with the $700 price tag.

Coinsider one of the holy laws of electronics: P = I^2*R. This says that the power you dissipate in an element with resistivity R goes as I squared. So if you double the clock rate, you must get rid of four times the heat (since you essentially have to double the current) . In reality things are much worse because the higher the clock rate, the more leakage currents (electrons that flow to places where you don't want them to go) increase. Particularly when you shrink the process node. At some point the current goes through the roof and you instantly burn your cpu. The Jaguar cores can easily go to 2Ghz (producing roughly 60% more heat), but doubling to 3.2GHz is off limits. If you tried to increase the speed of the gpu part in the die, you'd run into thermal troubles much, much faster than with the cpu, because there are so many tiny processors in those CUs that you will soon be unable get rid of the heat fast enough. Even a "puny" 20% speed increase in the gpu part stresses the design to the limit or over it.

Doubling the functional units obviously doubles the die space those units require. The PS4 apu is approx. 320mm^2 in size. Doubling the gpu would increase the size to probably around 450mm^2. Unfortunately now you have the problem that you can't really feed all the gpu units anymore because you just can't get enough data into them. So you are more or less forced to use a wider bus, which costs you even more space for another 1-2 gddr5 controllers. These again take up so much space on your die that so you rapidly pass a die size of 500mm^2, which is generally considered an economically viable limit (also making such huge chips is an engineering nightmare). Over 500mm^2, that would be a giant chip, NVidia has those made, and they cost a fortune to manufacture.

In the example for Sony, and as a mental exercise, I'd go the following route: Go to bed with Samsung, cooperate with their 20nm process technology. Make 1Gbit gddr5 chips. Adapt the Jaguar (already existing in labs) successor to Samsung design rules. Increase the CU count to 24 (and some more TMUs). Add another 64bit gddr5 controller, getting ram up to 10G with 10 chips. Increase speeds depending what the 20nm process allows for, probably 10%, 20% probably is the best hope. This would give you a PS4.5 with about 250% cpu power and about 150% gpu power in a die not much bigger than now, not much hotter than now, within about 3 years.

All this can be done, there are no technical obstacles preventing that path. Will they do it? Very unlikely, but one can always dream. And in three years, other design philosophies are "more hip", or console gaming is dying fast..



Or drop Jaguar completely and jump onboard with Jaguars replacement. Aka. - Puma. :)

But, that's all hindsight, we all have to put up with the current hardware until their replacements come along.




www.youtube.com/@Pemalite

They are redundant! Incase of faulty yields 2 CU's are left redundant!