By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo - Shin'en: If you can't make great looking games on Wii U, it's not the hardware's fault

forethought14 said:
fatslob-:O said:

BTW the eDRAM is located on the GPU not the CPU, the WII U features an MCM design and not an APU one. Oh and alot of games don't use the cpu for rendering they use the gpu if you wonder how the processor is able to keep up. The only workload that is usually assigned to the cpu is AI which is small, keeping track of game elements that doesn't have to do with rendering, and other things that is not floating point heavy. 

The 3MB L2 Cache in Espresso is IBM's own eDRAM, completely different than the eDRAM on the GPU die, manufactured by two different people. We've known that the cache is eDRAM since it was announced. IBM announced it themselves! 

http://www-03.ibm.com/press/us/en/pressrelease/34683.wss

IBM said:

IBM's unique embedded DRAM, for example, is capable of feeding the multi-core processor large chunks of data to make for a smooth entertainment experience

As for the rest, the matmul SIMD test says otherwise. Again, not wanting to start that conversation again, but it performs very well against several other more modern designs, while only using 64-bit SIMD (Paired singles, in this case, isn't "real" SIMD). That is a pure SIMD test, no help from outside sources. If a CPU had much better SIMD, and had a good enough architecture to go along with it, then that CPU should finish the test faster than others. Of course, if you use it incorrectly, it will lose a lot of performance. Many developers have never even developed for this architecture before, as obviously noted by several devs who clearly say that thay're still getting used to the architecture. 

You're kidding me right ? It's almost the same as the ibm broadway sure there are developers like 4A games who have never touched it before but that's nintendo's fault for not getting alot of 3rd parties earlier.



Around the Network
DJEVOLVE said:
So make a great looking game.


...they already have !



fatslob-:O said:
forethought14 said:
snowdog said:

Great post, but thought I'd add two important things. Firstly that Expresso has a ridiculously short pipeline (4 stages!) and Jaguar has a 17 stages, so over 4 times more stages to do before a process is completed and secondly that Expresso also has access to 32MB of eDRAM in addition to the 3MB of CPU cache.

The second point above is HUGE.

Short pipeline stages are a limiting factor for several things. For starters, they don't allow such heavy amounts of floating point instruction sets, as we've seen that Espresso shares the same 32 x 2 Paired Singles instructions that Broadway and Gekko had. There are advantages and disadvantages to having short pipeline stages. But considering how this processor can run modern software decently, I think it's something that pushes well above its weight. If I were to code for a CPU in 2013 that used 32x2 SIMD, I would definitely have some doubts about it. Its architecture is very well designed for what it was made to do, and has aged very well. 

I guess the odd-ball for Espresso is having so much cache for one core, I don't really understand their purpose for doing that. Based on the die shots, there isn't anything different about Core 01 compared to the other two, aside from having access to 4x more cache than the other two. One plus for the cache though, is having eDRAM as cache. 

BTW the eDRAM is located on the GPU not the CPU, 


From the interview in the OP: 

"On Wii U the eDRAM is available to the GPU and CPU."



curl-6 said:
fatslob-:O said:
forethought14 said:
snowdog said:

Great post, but thought I'd add two important things. Firstly that Expresso has a ridiculously short pipeline (4 stages!) and Jaguar has a 17 stages, so over 4 times more stages to do before a process is completed and secondly that Expresso also has access to 32MB of eDRAM in addition to the 3MB of CPU cache.

The second point above is HUGE.

Short pipeline stages are a limiting factor for several things. For starters, they don't allow such heavy amounts of floating point instruction sets, as we've seen that Espresso shares the same 32 x 2 Paired Singles instructions that Broadway and Gekko had. There are advantages and disadvantages to having short pipeline stages. But considering how this processor can run modern software decently, I think it's something that pushes well above its weight. If I were to code for a CPU in 2013 that used 32x2 SIMD, I would definitely have some doubts about it. Its architecture is very well designed for what it was made to do, and has aged very well. 

I guess the odd-ball for Espresso is having so much cache for one core, I don't really understand their purpose for doing that. Based on the die shots, there isn't anything different about Core 01 compared to the other two, aside from having access to 4x more cache than the other two. One plus for the cache though, is having eDRAM as cache. 

BTW the eDRAM is located on the GPU not the CPU, 


From the interview in the OP: 

"On Wii U the eDRAM is available to the GPU and CPU."

Yeah I heard from forethought14's last post but it only has 3mb of eDRAM.



ninjablade said:

for all the talk about wiiu graphics looking better then current gen, all you have to do is post direct feed gameplay pics of the best looking ps3/360 games vs the best looking wiiu games, its easy to se that current gen games still look much better then best of wiiu.

That's an absurd comparison.

The best looking PS3 and 360 games are the result of 7-8 years of experience with the hardware, massive budgets, and the full commitment of some of the most graphically skilled teams in the world working from the ground up.

There isn't single Wii U game to date that's built from the ground up by a skilled team, with plentiful resources, and the intent of pushing the system. Not one.

PS3/360 are maxed out, but nothing on Wii U pushes it even close to its limit.

You might as well compare Conker's Bad Fur Day on the original Xbox to Kameo on the 360 and say the 360's not a big leap over the Xbox.



Around the Network
fatslob-:O said:
Pemalite said:
fatslob-:O said:

Just so you know IPC isn't the whole story of performance. Oh and you appear to be right about jaguar (Got to check things out more often on my part.). I'm pretty sure floating point performance is a standard there are others like integer performance but that is not as important as the later.


IPC does matter, that's Instructions Per Clock, it's certainly more important than flops and is why AMD hasn't *really* been able to compete with Intel in the high-end, despite Intel having a core disadvantage.
The problem with multi-threading a game is that you have a primary thread, all your secondary threads have dependancies in the first, that's why generally you will always have a single thread that is more demanding than any others on a multi-core system when running a game, that's why a high IPC (A way to fixing it isn`t exactly a higher IPC but a higher clocks will do the trick because it will resolve dependencies alot better than a higher IPC.) at any given clock is important.

Floating Point as a gauge on performance really is pointless for games, note I said games.
If you were building an application that only used floating point, then it's an important measurement.

Game engines use allot of different types of math to achieve a certain result, the problem is you nor I will ever know what type of math is being used at any one instance in a games scene. (You do realize why we went away from integer math in the first place, right ? Here I'll give you the answer on my behalf and that's because using integers to calculate transformations and lightings was too slow initially and it became especially prevalent in the 3D graphics era. Alot of processors those days didn't feature an fpu and instead to render in 3D precisely they had to use software enabled floating point math.)
Take the Cell for example, it's a single-precision iterative refinement floating point monster, my CPU isn't even as fast in real-world tests when it comes to that, however the Cell's integer and double precisions floating point performance is orders of magnitude slower than my 3930K or even Jaguar, yet, overall in a real-world gaming situation, both CPU's would pummel the Cell.

@Bold

There are other factors to performance such as clock speeds too. If both processors can output the same amount of instructions at the same amount of time then I would prefer if the procressor has the higher clock because not only does it match the processor with a higher IPC but it has a higher clock`s to overcome sequential workloads too. Remeber how a single core clocked at 3 Ghz will beat a dual core that does 1.5 Ghz. Well the same situation applies here because there are certain programs that won`t be able to leverage a higher IPC but instead a higher clock provided that the each line of code is dependent on the last line of code. (I did not say that IPC did not matter but you have to account orther things too about the processor.)

Yet these games use alot of floating point math from what I am seeing. (Even if games were to use integer it`d be miniscule alot of times and you know this.)

The cell had alot more in common with what you'd call a vector processor. The actual processor was the PPE as for the SPE's they were a bunch of SIMD units, nothing more plus their instruction sets were pretty narrow.

Nope, integer math is still used heavily.
The industry moved transform and lighting from being done on the CPU to the GPU because, well. It's faster (I was a PC gamer even when they first made that transition with the Geforce 256 I AM GETTING OLD!), it's a highly parallel task, initially it was done via a fixed function hardware block untill later it was performed on the pixel shader pipelines (It's one of the reasons why the Geforce FX faltered against the Radeon 9700 series, due to wasted transisters on the TnL hardware where-as AMD had it all done in the pixel pipelines.)
In-fact if you look at the evolution of the PC's hardware, over time the CPU has been tasked with doing less and less processing as everything is offloaded to the graphics hardware due to it being able to take advantage of highly parallel tasks.

Back then, CPU's did have a floating point unit, however processors like AMD K6 ande Cyrix M3 had a very poor floating point unit when compared to the Pentium 1, 2 and 3 and the Celeron counterparts, it wasn't untill AMD introduced the K7 which was a massive departure from the K6 series that they actually caught up.
Heck even AMD's FX has it's floating point unit shared between 2 processing cores.
AMD's eventual goal for it's Fusion initiative is to remove the floating point unit entirely from it's CPU and instead move it onto the graphics processor which is more suited to that type of processing.

As for clockspeed, I agree it is important, to an extent and only if all things are equal architecturally, I would for instance take a single core Haswell processor over a Single core Netburst processor even if the Netburst processor has twice the frequency, why? Because Haswell has a much higher IPC, so it can do more work than a Netburst processor even at a low clockspeed, but this is all old news, the Gigahert race between AMD and Intel eventually showed it's not a primary measurement of performance with two different architectures.

The problem with the consoles is that they all have a low clockspeed and a low-IPC CPU, so all-round, they're crap. :P
However, where this generation is different compared to prior ones is that the GPU can offload some of the processing anyway.




www.youtube.com/@Pemalite

fatslob-:O said:
curl-6 said:
fatslob-:O said:

BTW the eDRAM is located on the GPU not the CPU, 


From the interview in the OP: 

"On Wii U the eDRAM is available to the GPU and CPU."

Yeah I heard from forethought14's last post but it only has 3mb of eDRAM.

He made a mistake; it has 3MB of L2 cache. The eDRAM bank is 32MB, and both the CPU and GPU can access it.



curl-6 said:
ninjablade said:

for all the talk about wiiu graphics looking better then current gen, all you have to do is post direct feed gameplay pics of the best looking ps3/360 games vs the best looking wiiu games, its easy to se that current gen games still look much better then best of wiiu.

That's an absurd comparison.

The best looking PS3 and 360 games are the result of 7-8 years of experience with the hardware, massive budgets, and the full commitment of some of the most graphically skilled teams in the world working from the ground up.

There isn't single Wii U game to date that's built from the ground up by a skilled team, with plentiful resources, and the intent of pushing the system. Not one.

PS3/360 are maxed out, but nothing on Wii U pushes it even close to its limit.

You might as well compare Conker's Bad Fur Day on the original Xbox to Kameo on the 360 and say the 360's not a big leap over the Xbox.

This winter we'll be getting our first good comparisons. Watch dogs for example.



curl-6 said:
fatslob-:O said:
curl-6 said:
fatslob-:O said:

BTW the eDRAM is located on the GPU not the CPU, 


From the interview in the OP: 

"On Wii U the eDRAM is available to the GPU and CPU."

Yeah I heard from forethought14's last post but it only has 3mb of eDRAM.

He made a mistake; it has 3MB of L2 cache. The eDRAM bank is 32MB, and both the CPU and GPU can access it.

But the GPU can access it better because it's integrated on to the die.



fatslob-:O said:
curl-6 said:
fatslob-:O said:

Yeah I heard from forethought14's last post but it only has 3mb of eDRAM.

He made a mistake; it has 3MB of L2 cache. The eDRAM bank is 32MB, and both the CPU and GPU can access it.

But the GPU can access it better because it's integrated on to the die.

It's still useful for CPU tasks though. An an older interview, Shin'en talked about how they used the eDRAM for "fast scratch memory for some CPU intense work."

http://playeressence.com/wii-u-edram-explained-by-shinen/