By using this site, you agree to our Privacy Policy and our Terms of Use. Close

more realistic scenerio is from beyond3D

With all the peeping at die shots (which has been tremendous fun) I think we might have gotten tunnel vision and be losing the "big picture". The question of "320 vs 160" shaders is still unanswered and stepping back should help us answer it.

The current popular hypothesis that Latte is a 16:320:8 part @ 550 mHz. Fortunately, we can see how such a part runs games on the PC. You know, the PC, that inefficient beast that's held back by Windows, thick APIs, Direct X draw-call bottlencks that break the back of even fast CPUs, and all that stuff. Here is a HD 5550, a VLIW5 GPU with a 16:320:8 configuration running at @550 mhz:

http://www.techpowerup.com/reviews/H...HD_5550/7.html

And it blows past the 360 without any problems. It's not even close. And that's despite being on the PC!

Now lets scale things back a bit. This is the Llano A3500M w/ Radeon 6620G - a 20:400:8 configuration GPU, but it runs @ 444 MHz meaning it has exactly the same number of gflops and TMU ops as the HD 5550, only it's got about 20% lower triangle setup and fillrate *and* it's crippled by a 128 bit DDR 1333 memory pool *and* it's linked to a slower CPU than the above benchmark (so more likely to suffer from Windows/DX bottlenecks). No super fast pool of edram for this poor boy!

http://www.anandtech.com/show/4444/a...pu-a8-3500m/11
http://www.anandtech.com/show/4444/a...pu-a8-3500m/12

And it *still* comfortably exceeds the 360 in terms of the performance that it delivers. Now lets look again at the Wii U. Does it blow past the 360? Does it even comfortably exceed the 360? No, it:

keeps
losing
marginally
to
the
Xbox
360

... and that's despite it *not* being born into the performance wheelchair that is the Windows PC ecosystem. Even if the Wii U can crawl past the 360 - marginally - in a game like Trine 2 it's still far below what we'd expect from a HD5550 or even the slower and BW crippled 6620G. So why is this?

It appears that there two options. Either Latte is horrendously crippled by something (API? memory? documentation? "drivers"?) to the point that even equivalent or less-than equivalent PC part can bounce its ass around the field, or ... it's not actually a 16:320:8 part.

TL: DR version:
Latte seems to be either:
1) a horrendously crippled part compared to equivalent (or lower) PC GPUs, or
2) actually a rather efficient 160 shader part

Aaaaaaand I'll go with the latte(r) as the most likely option. Face it dawgs, the word on the street just don't jive with the scenes on the screens


.I agree that there is probably something missing. The 320 number don't seem to match up with anything. The layout of the SIMD looks like its the same as for 20 ALUs with the same number of cache blocks. The only thing explaining 320 SPs is the supposed 40nm process and the block being slightly too big. Even that doesn't explain it fully.

The SIMD blocks are 60% the size of Llano's and only about 30% larger than bobcat's 20 SPs. Even on 40nm, its pretty absurd that the density increased so much. We also don't have conclusive evidence it is 40nm. The only thing the pins 40nm right now seems to be the eDRAM size. Which is a really rough estimate from what I can tell.

There is too much unconfirmed things. I don't even know how everyone jumped onto the 320 SPs ship so fast. So far the similarities of the SIMD blocks compared with bobcat should point at 20 shaders per block at a larger manufacturing process. Thats what you'd get if you only looked at the SIMD blocks.

I find its much more likely they found a way to pack eDRAM slightly denser than to somehow pack the ALU logic smaller and cut away half the cache blocks. Or maybe the whole chip is 40nm but the logic isn't packed very dense because it is not originally designed for that process and fab. This is all much more likely from my point of view than magically have 320 SPs in so little space.