green_sky said: I scrolled entire thread looking for BlueFalcon post. He hasn't posted yet. Some one make sense out of this. |
I'll try my best. Here is my initial wild stab.
Starting with RV770 (HD4870) as that's the complete R700 GPU.
Green Square is your Texture Block Unit. If you count down vertically, you get 10 of those. Each Green square has 4 TMUs. So the total # of TMUs in RV770 is 10 Texture Block Units x 4 TMus per block = 40 TMUs
On the right hand side, you have Blue Rectangle I highlighted. Those are SIMDs (or shader cores). In RV770, it looks like there are 4 complete squares inside 1 Blue Rectangle. If you count down vertically again, the total # of SIMD blocks is 10. Each SIMD block array is made up of 16 VLIW-5 shader units, giving us a total of 16x5 = 80 Shaders per 1 SIMD block, for a total of 80 Shaders per SIMD x 10 SIMDs = 800 Shaders in RV770.
Notice that since 1 full length SIMD block has 4 distinct sections, we end up with 80 Shaders / 4 = 20 Shaders per each sub-block in 1 SIMD array.
Here are the building blocks for RV770 in simpler terms:
You can see that there are 10 SIMDs with 16 VLIW-5 shaders:
OK now the Wii U:
Now notice how each SIMD array has 3 full blocks but 2 "half-blocks" above and below the 3 full ones? If those are somehow active, you end up with 3+ 1/2+1/2 = 4 Sub-blocks per 1 SIMD or a total of 4x 20 = 80 Shaders per SIMD. 8 Total SIMDs x 80 Shaders would give 640 Shaders (exactly the same as RV740).
The missing piece are ROPs. We know the Wii U has 64-bit memory interface because it has 4x 256MB chips over 16-bit bus each. Therefore, the GPU has half the memory blocks of RV740. RV740 had 32 TMUs and 16 ROPs, with 128-bit bus. This matches up perfectly with HALF of RV740's 16 TMUs / 8 ROP / 64-bit bus.
http://www.gpureview.com/show_cards.php?card1=564&card2=612
At 40nm, the RV740 crams 826 million transistors into a 137 mm² die. (http://techreport.com/review/16820/amd-radeon-hd-4770-graphics-processor)
This could explain why Nintendo sacrificed half the TMUs and ROPs or RV740 (and coincidentally explains the 64-bit halving of memory bandwidth from RV740's full 128-bit).
I can't but help thinking since Nintendo is striving to maximize every mm2 here, I am going with a prediction that the 2 "half-blocks" in the SIMD arrays are enabled. So probably the 8 SIMD x 80 Shaders per block = 640 SPs in total, exactly like in RV740, is retained.
http://techreport.com/review/16820/amd-radeon-hd-4770-graphics-processor
My pure guess estimates:
RV740 derived 40nm GPU @ 550mhz
480 - 640 SPs
16 TMUs (Half of RV740)
8 ROPs (Half of RV740s = 2 Render Back Ends instead of 4 on RV740)
64-bit x 1 memory bandwidth