By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo Discussion - Wii U GPU Die Image! Chipworks is AWESOME!

Aielyn said:
OK, perhaps someone can explain something to me.

The Wii U is speculated to have 320 stream processors, right? That's based on 40 stream processors per block, 8 blocks. And other possible values are either 256 stream processors (32 per block) or 160 stream processors (20 per block). And assuming each one gets one FLOP per cycle, you get the entire set producing up to 352 GFLOPS of processing power, right?

The 360 had just 48 stream processors, with a clock speed just a little slower than that of the Wii U's GPU, but got 240 GFLOPS from them. It did this by having each stream processor capable of up to 10 FLOPs per cycle.

So my question is this: why is it automatically assumed that the Wii U's stream processors are only capable of one FLOP per cycle, given this? It's a serious question, not rhetorical - I'm trying to understand why this isn't under consideration; is it lack of knowledge of GPUs on my part, a detail that I'm not aware of, or is it a possible oversight by the people analysing the system?

The other detail that, to me, goes with this question, is why, given the current speculation about the GPU, do we keep hearing about how the Wii U gets an amazing amount of graphical capability for its power draw? It has been repeatedly suggested or implied that the Wii U's efficiency is remarkably high. How does this mesh with the speculated details? What impact would having stream processors similar to those in the 360 (that is, stream processors that have a net power of multiple FLOPs per cycle) have on the power draw, relative to having more stream processors?


It's because the system was designed by Nintendo.  So automatically anti-Nintendo fanboys (yes there is a lot of them) assume that the system is impotent!!!  I was just reading the DF and B3D website and they were so full of inaccuracies (though there are some good info on them too) it was sickening.  Though to be honest I should have expected that from those two websites.  DF has been butthurt since the Tegra chip debacle on the 3DS; and B3D isn't what it used to be.



Around the Network
Aielyn said:
OK, perhaps someone can explain something to me.

The Wii U is speculated to have 320 stream processors, right? That's based on 40 stream processors per block, 8 blocks. And other possible values are either 256 stream processors (32 per block) or 160 stream processors (20 per block). And assuming each one gets one FLOP per cycle, you get the entire set producing up to 352 GFLOPS of processing power, right?

The 360 had just 48 stream processors, with a clock speed just a little slower than that of the Wii U's GPU, but got 240 GFLOPS from them. It did this by having each stream processor capable of up to 10 FLOPs per cycle.

So my question is this: why is it automatically assumed that the Wii U's stream processors are only capable of one FLOP per cycle, given this? It's a serious question, not rhetorical - I'm trying to understand why this isn't under consideration; is it lack of knowledge of GPUs on my part, a detail that I'm not aware of, or is it a possible oversight by the people analysing the system?

The other detail that, to me, goes with this question, is why, given the current speculation about the GPU, do we keep hearing about how the Wii U gets an amazing amount of graphical capability for its power draw? It has been repeatedly suggested or implied that the Wii U's efficiency is remarkably high. How does this mesh with the speculated details? What impact would having stream processors similar to those in the 360 (that is, stream processors that have a net power of multiple FLOPs per cycle) have on the power draw, relative to having more stream processors?


Xenos is 3 SIMD x 16 shader cores x 5ALUs architecture running at 500MHz. Each ALU can do 1add + 1mutliply simultaniously, so 240 x 2 x 500MHz=240GFLOPS.

If WiiUs indeed has 320 shaders, then math is: 320 x 2 x 550MHz = 352GFLOPS



Aielyn said:
OK, perhaps someone can explain something to me.

The Wii U is speculated to have 320 stream processors, right? That's based on 40 stream processors per block, 8 blocks. And other possible values are either 256 stream processors (32 per block) or 160 stream processors (20 per block). And assuming each one gets one FLOP per cycle, you get the entire set producing up to 352 GFLOPS of processing power, right?

The 360 had just 48 stream processors, with a clock speed just a little slower than that of the Wii U's GPU, but got 240 GFLOPS from them. It did this by having each stream processor capable of up to 10 FLOPs per cycle.

So my question is this: why is it automatically assumed that the Wii U's stream processors are only capable of one FLOP per cycle, given this? It's a serious question, not rhetorical - I'm trying to understand why this isn't under consideration; is it lack of knowledge of GPUs on my part, a detail that I'm not aware of, or is it a possible oversight by the people analysing the system?

The other detail that, to me, goes with this question, is why, given the current speculation about the GPU, do we keep hearing about how the Wii U gets an amazing amount of graphical capability for its power draw? It has been repeatedly suggested or implied that the Wii U's efficiency is remarkably high. How does this mesh with the speculated details? What impact would having stream processors similar to those in the 360 (that is, stream processors that have a net power of multiple FLOPs per cycle) have on the power draw, relative to having more stream processors?

actually it is assumed they do 2 floating point operations(FLOPS) per clock (single precision), because all stream processors based on VLIW-5 (or VLIW-4 or GCN) architecture do so, you can read that up on amd.com (320 sps x 2 flops/clock x 550MHz = 352Gflops)

the shaders cores in the Xenos each contain 5 ALUs, which are capable of 2Flops/clock each and hence are similar to the SPs in newer designs, so you get a total of 48x5 = 240 ALUs x 2 flops/clock x 500MHz = theoretical processing power of 240Gflops

as for the power consumption I can only imagine the newer designs of the ALUs/SPs themselves are just less prone to power leakage and more simple, while maintaining the same workrate, so the chip is using less power overall and the newer organization into big blocks might get rid of bottlenecks in certain situations



Aielyn said:
OK, perhaps someone can explain something to me.

The Wii U is speculated to have 320 stream processors, right? That's based on 40 stream processors per block, 8 blocks. And other possible values are either 256 stream processors (32 per block) or 160 stream processors (20 per block). And assuming each one gets one FLOP per cycle, you get the entire set producing up to 352 GFLOPS of processing power, right?

The 360 had just 48 stream processors, with a clock speed just a little slower than that of the Wii U's GPU, but got 240 GFLOPS from them. It did this by having each stream processor capable of up to 10 FLOPs per cycle.

So my question is this: why is it automatically assumed that the Wii U's stream processors are only capable of one FLOP per cycle, given this? It's a serious question, not rhetorical - I'm trying to understand why this isn't under consideration; is it lack of knowledge of GPUs on my part, a detail that I'm not aware of, or is it a possible oversight by the people analysing the system?

The other detail that, to me, goes with this question, is why, given the current speculation about the GPU, do we keep hearing about how the Wii U gets an amazing amount of graphical capability for its power draw? It has been repeatedly suggested or implied that the Wii U's efficiency is remarkably high. How does this mesh with the speculated details? What impact would having stream processors similar to those in the 360 (that is, stream processors that have a net power of multiple FLOPs per cycle) have on the power draw, relative to having more stream processors?


xenos doesn't have Stream Processors and every GPU from AMD since the Xenos uses Stream processors in various configurations. 



@TheVoxelman on twitter

Check out my hype threads: Cyberpunk, and The Witcher 3!

Lafiel said:
actually it is assumed they do 2 floating point operations(FLOPS) per clock (single precision), because all stream processors based on VLIW-5 (or VLIW-4 or GCN) architecture do so, you can read that up on amd.com (320 sps x 2 flops/clock x 550MHz = 352Gflops)

the shaders cores in the Xenos each contain 5 ALUs, which are capable of 2Flops/clock each and hence are similar to the SPs in newer designs, so you get a total of 48x5 = 240 ALUs x 2 flops/clock x 500MHz = theoretical processing power of 240Gflops

as for the power consumption I can only imagine the newer designs of the ALUs/SPs themselves are just less prone to power leakage and more simple, while maintaining the same workrate, so the chip is using less power overall and the newer organization into big blocks might get rid of bottlenecks in certain situations

I've chosen to respond to your answer, rather than HoloDust's, because it contained the most information.

Let me ask you this - how does a "stream processor" differ from an "ALU"? That is, I can understand that both have 2 flops/clock, but if that's the case, why aren't they called the same thing? What does one do that the other doesn't?



Around the Network
Aielyn said:
Lafiel said:
actually it is assumed they do 2 floating point operations(FLOPS) per clock (single precision), because all stream processors based on VLIW-5 (or VLIW-4 or GCN) architecture do so, you can read that up on amd.com (320 sps x 2 flops/clock x 550MHz = 352Gflops)

the shaders cores in the Xenos each contain 5 ALUs, which are capable of 2Flops/clock each and hence are similar to the SPs in newer designs, so you get a total of 48x5 = 240 ALUs x 2 flops/clock x 500MHz = theoretical processing power of 240Gflops

as for the power consumption I can only imagine the newer designs of the ALUs/SPs themselves are just less prone to power leakage and more simple, while maintaining the same workrate, so the chip is using less power overall and the newer organization into big blocks might get rid of bottlenecks in certain situations

I've chosen to respond to your answer, rather than HoloDust's, because it contained the most information.

Let me ask you this - how does a "stream processor" differ from an "ALU"? That is, I can understand that both have 2 flops/clock, but if that's the case, why aren't they called the same thing? What does one do that the other doesn't?


I this case, they are the same, there most likely isn't actually any difference between them - Xenos' shaders are based on R600 architecture, for WiiU it is believed it is based on R700, but Markan said it's registers are R6xx, so it might be also based on R600. Even if it's R700 (or even Evergreen) they are all VLIW5, which means 4 "simple" + 1 "special" in group (let's call them like that, to not overcomplicate things). VLIW5 was pretty inefficient, using mostly 3-4 out of 5 ALUs, so later they dropped to VLIW4, and finally to GCN in 77xx and up cards.



HoloDust said:
Aielyn said:
Lafiel said:
actually it is assumed they do 2 floating point operations(FLOPS) per clock (single precision), because all stream processors based on VLIW-5 (or VLIW-4 or GCN) architecture do so, you can read that up on amd.com (320 sps x 2 flops/clock x 550MHz = 352Gflops)

the shaders cores in the Xenos each contain 5 ALUs, which are capable of 2Flops/clock each and hence are similar to the SPs in newer designs, so you get a total of 48x5 = 240 ALUs x 2 flops/clock x 500MHz = theoretical processing power of 240Gflops

as for the power consumption I can only imagine the newer designs of the ALUs/SPs themselves are just less prone to power leakage and more simple, while maintaining the same workrate, so the chip is using less power overall and the newer organization into big blocks might get rid of bottlenecks in certain situations

I've chosen to respond to your answer, rather than HoloDust's, because it contained the most information.

Let me ask you this - how does a "stream processor" differ from an "ALU"? That is, I can understand that both have 2 flops/clock, but if that's the case, why aren't they called the same thing? What does one do that the other doesn't?


I this case, they are the same, there most likely isn't actually any difference between them - Xenos' shaders are based on R600 architecture, for WiiU it is believed it is based on R700, but Markan said it's registers are R6xx, so it might be also based on R600. Even if it's R700 (or even Evergreen) they are all VLIW5, which means 4 "simple" + 1 "special" in group (let's call them like that, to not overcomplicate things). VLIW5 was pretty inefficient, using mostly 3-4 out of 5 ALUs, so later they dropped to VLIW4, and finally to GCN in 77xx and up cards.

Xenos is more of a hybrid of R500 and R600, overall closer to R520 but with unified shaders. 



@TheVoxelman on twitter

Check out my hype threads: Cyberpunk, and The Witcher 3!

zarx said:
HoloDust said:

I this case, they are the same, there most likely isn't actually any difference between them - Xenos' shaders are based on R600 architecture, for WiiU it is believed it is based on R700, but Markan said it's registers are R6xx, so it might be also based on R600. Even if it's R700 (or even Evergreen) they are all VLIW5, which means 4 "simple" + 1 "special" in group (let's call them like that, to not overcomplicate things). VLIW5 was pretty inefficient, using mostly 3-4 out of 5 ALUs, so later they dropped to VLIW4, and finally to GCN in 77xx and up cards.

Xenos is more of a hybrid of R500 and R600, overall closer to R520 but with unified shaders. 

Indeed, but, since we're talking about shaders here, and it's shaders are pretty much same as in R600....



HoloDust said:
I this case, they are the same, there most likely isn't actually any difference between them - Xenos' shaders are based on R600 architecture, for WiiU it is believed it is based on R700, but Markan said it's registers are R6xx, so it might be also based on R600. Even if it's R700 (or even Evergreen) they are all VLIW5, which means 4 "simple" + 1 "special" in group (let's call them like that, to not overcomplicate things). VLIW5 was pretty inefficient, using mostly 3-4 out of 5 ALUs, so later they dropped to VLIW4, and finally to GCN in 77xx and up cards.

So... if they found that VLIW5 was inefficient, and VLIW4 increases efficiency, why is it then assumed that VLIW5 is being used in the Wii U? And what impact would that change have on the Wii U GPU interpretation, if possible?

Keep in mind, I'm not arguing for VLIW4 - I'm just trying to understand the reasoning behind each of the claims being made, because at this time, I'm getting somewhat confused by terminology and the details of GPU design.



Aielyn said:

So... if they found that VLIW5 was inefficient, and VLIW4 increases efficiency, why is it then assumed that VLIW5 is being used in the Wii U? And what impact would that change have on the Wii U GPU interpretation, if possible?

Keep in mind, I'm not arguing for VLIW4 - I'm just trying to understand the reasoning behind each of the claims being made, because at this time, I'm getting somewhat confused by terminology and the details of GPU design.

The design step from VLIW5 to VLIW4 isn't that old at all. Probably happened around the same time the WiiU gpu design was started, so anything is possible.

I must admit that the gpu die picture is irritating. Usually I start from the data/address bus to figure out what probably is in a chip. With the WiiU gpu, I can't even FIND the busses. Somehow reminds me of the first processor dies which were "random structures"...