By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo - Shin'en: If you can't make great looking games on Wii U, it's not the hardware's fault

Tagging for all the bs that i am going to read and laughs that im going to have :)



Not-so-proud owner of every current-gen system. 

Next-gen is upon us folks!

And some cool and inspiring quotes

“Always forgive your enemies; nothing annoys them so much.” 
― Oscar Wilde
“Be who you are and say what you feel, because those who mind don't matter, and those who matter don't mind.” 
― Bernard M. Baruch
Around the Network
snowdog said:
For anyone doubting the dog's bollocks talent of Shin'en I just have 2 words - Jett Rocket.

And that was amazingly squeezed into a 40MB compressed file too.


To be fair, the PICA200 is a series.....i.e. 2006 & 2008 & 2010 & 2012, and sorry if I miss any.

 

PICA200 2K8 vertex performence is 40.7 million polygons @ 100 MHz (max clock frequency 600 MHz)

PICA200 2006 vertex performence is 15.3 million polygons @ 200 MHz (max clock frequency 400 MHz) with 4 vertex pipelines, if that means anything.

PICA200 2010 is 1 GHz max clock frequency and Won the "Micro GPU of the Year Award" of 2010.

The 2008 model drains far less power then the 2K6 model.

So even PICA200 2008 does 162.8 million polygons when clocked @ 400 MHz for Vertex Performence

Depending on whether they used the 2K8 or 2K10 model, we can be looking at somewhere from 60 to 100 shader cores.

But fitting it all onto 40 MB is mighty impressive, but Nintendo has been working on making their games used less data since GameCube, and has reduce the Pokémon data for original Pokédex 3D when I had move all the extra data out to redownload, plus the Pushmo update only reduce the size of the game for how much data it toke to be on your system (1st party game).

So it might be Nintendo who we have to praise for fitting it all on there, since Nintendo does update 3rd parties on the latest 3D amount of depth each month, so I'm sure Nintendo does something similar for their 3rd party Devs when it comes to how much data it will take up, at least for the eShop for the amount of data.



bobgamer said:
Tagging for all the bs that i am going to read and laughs that im going to have :)


You & I; think alike.



I was talking about Jett Rocket on WiiWare mate.



Pemalite said:
fatslob-:O said:

Just so you know IPC isn't the whole story of performance. Oh and you appear to be right about jaguar (Got to check things out more often on my part.). I'm pretty sure floating point performance is a standard there are others like integer performance but that is not as important as the later.


IPC does matter, that's Instructions Per Clock, it's certainly more important than flops and is why AMD hasn't *really* been able to compete with Intel in the high-end, despite Intel having a core disadvantage.
The problem with multi-threading a game is that you have a primary thread, all your secondary threads have dependancies in the first, that's why generally you will always have a single thread that is more demanding than any others on a multi-core system when running a game, that's why a high IPC (A way to fixing it isn`t exactly a higher IPC but a higher clocks will do the trick because it will resolve dependencies alot better than a higher IPC.) at any given clock is important.

Floating Point as a gauge on performance really is pointless for games, note I said games.
If you were building an application that only used floating point, then it's an important measurement.

Game engines use allot of different types of math to achieve a certain result, the problem is you nor I will ever know what type of math is being used at any one instance in a games scene. (You do realize why we went away from integer math in the first place, right ? Here I'll give you the answer on my behalf and that's because using integers to calculate transformations and lightings was too slow initially and it became especially prevalent in the 3D graphics era. Alot of processors those days didn't feature an fpu and instead to render in 3D precisely they had to use software enabled floating point math.)
Take the Cell for example, it's a single-precision iterative refinement floating point monster, my CPU isn't even as fast in real-world tests when it comes to that, however the Cell's integer and double precisions floating point performance is orders of magnitude slower than my 3930K or even Jaguar, yet, overall in a real-world gaming situation, both CPU's would pummel the Cell.

@Bold

There are other factors to performance such as clock speeds too. If both processors can output the same amount of instructions at the same amount of time then I would prefer if the procressor has the higher clock because not only does it match the processor with a higher IPC but it has a higher clock`s to overcome sequential workloads too. Remeber how a single core clocked at 3 Ghz will beat a dual core that does 1.5 Ghz. Well the same situation applies here because there are certain programs that won`t be able to leverage a higher IPC but instead a higher clock provided that the each line of code is dependent on the last line of code. (I did not say that IPC did not matter but you have to account orther things too about the processor.)

Yet these games use alot of floating point math from what I am seeing. (Even if games were to use integer it`d be miniscule alot of times and you know this.)

The cell had alot more in common with what you'd call a vector processor. The actual processor was the PPE as for the SPE's they were a bunch of SIMD units, nothing more plus their instruction sets were pretty narrow.



Around the Network
snowdog said:

Great post, but thought I'd add two important things. Firstly that Expresso has a ridiculously short pipeline (4 stages!) and Jaguar has a 17 stages, so over 4 times more stages to do before a process is completed and secondly that Expresso also has access to 32MB of eDRAM in addition to the 3MB of CPU cache.

The second point above is HUGE.

Short pipeline stages are a limiting factor for several things. For starters, they don't allow such heavy amounts of floating point instruction sets, as we've seen that Espresso shares the same 32 x 2 Paired Singles instructions that Broadway and Gekko had. There are advantages and disadvantages to having short pipeline stages. But considering how this processor can run modern software decently, I think it's something that pushes well above its weight. If I were to code for a CPU in 2013 that used 32x2 SIMD, I would definitely have some doubts about it. Its architecture is very well designed for what it was made to do, and has aged very well. 

I guess the odd-ball for Espresso is having so much cache for one core, I don't really understand their purpose for doing that. Based on the die shots, there isn't anything different about Core 01 compared to the other two, aside from having access to 4x more cache than the other two. One plus for the cache though, is having eDRAM as cache. 



snowdog said:
I was talking about Jett Rocket on WiiWare mate.


LOL, my bad.

I guess it's safe to say the sequel (3DS exclusive) will definitely take up more memory.



forethought14 said:
snowdog said:

Great post, but thought I'd add two important things. Firstly that Expresso has a ridiculously short pipeline (4 stages!) and Jaguar has a 17 stages, so over 4 times more stages to do before a process is completed and secondly that Expresso also has access to 32MB of eDRAM in addition to the 3MB of CPU cache.

The second point above is HUGE.

Short pipeline stages are a limiting factor for several things. For starters, they don't allow such heavy amounts of floating point instruction sets, as we've seen that Espresso shares the same 32 x 2 Paired Singles instructions that Broadway and Gekko had. There are advantages and disadvantages to having short pipeline stages. But considering how this processor can run modern software decently, I think it's something that pushes well above its weight. If I were to code for a CPU in 2013 that used 32x2 SIMD, I would definitely have some doubts about it. Its architecture is very well designed for what it was made to do, and has aged very well. 

I guess the odd-ball for Espresso is having so much cache for one core, I don't really understand their purpose for doing that. Based on the die shots, there isn't anything different about Core 01 compared to the other two, aside from having access to 4x more cache than the other two. One plus for the cache though, is having eDRAM as cache. 

BTW the eDRAM is located on the GPU not the CPU, the WII U features an MCM design and not an APU one. Oh and alot of games don't use the cpu for rendering they use the gpu if you wonder how the processor is able to keep up. The only workload that is usually assigned to the cpu is AI which is small, keeping track of game elements that doesn't have to do with rendering, and other things that is not floating point heavy. 



fatslob-:O said:

BTW the eDRAM is located on the GPU not the CPU, the WII U features an MCM design and not an APU one. Oh and alot of games don't use the cpu for rendering they use the gpu if you wonder how the processor is able to keep up. The only workload that is usually assigned to the cpu is AI which is small, keeping track of game elements that doesn't have to do with rendering, and other things that is not floating point heavy. 

The 3MB L2 Cache in Espresso is IBM's own eDRAM, completely different than the eDRAM on the GPU die, manufactured by two different people. We've known that the cache is eDRAM since it was announced. IBM announced it themselves! 

http://www-03.ibm.com/press/us/en/pressrelease/34683.wss

IBM said:

IBM's unique embedded DRAM, for example, is capable of feeding the multi-core processor large chunks of data to make for a smooth entertainment experience

As for the rest, the matmul SIMD test says otherwise. Again, not wanting to start that conversation again, but it performs very well against several other more modern designs, while only using 64-bit SIMD (Paired singles, in this case, isn't "real" SIMD). That is a pure SIMD test, no help from outside sources. If a CPU had much better SIMD, and had a good enough architecture to go along with it, then that CPU should finish the test faster than others. Of course, if you use it incorrectly, it will lose a lot of performance. Many developers have never even developed for this architecture before, as obviously noted by several devs who clearly say that thay're still getting used to the architecture. 



So make a great looking game.