By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo Discussion - Clearing up a major misconception about PowerPC

Mnementh said:
walsufnir said:


No, not really. Modern x86 is compatible but works completely different (µ-ops, different pipelines, branch-prediction, totally different instructions,...).

Yes, that is true. But you can say similar for Nintendos CPU. We know it is no vanilla PowerPC, at least in Gamecube and then again in Wii they added additional instructions. Probably the same is true for WiiUs CPU. We don't know much besides that, but it is more than possible that they overworked pipelines and branch-prediction too.


Yes, the gc PPC wasn't a vanilla PPC to begin with but it wasn't really a custom chip like Cell or Xenon. And reading https://fail0verflow.com/blog/2013/espresso.html tells me that there wasn't that much new stuff introduced to one of the cores itself.

And marcan also found out even more stuff of the cores: https://en.wikipedia.org/wiki/Wii_U_CPU

Broadway-based core architecture[14]

Three cores at 1.243125 GHz

Symmetric multiprocessing with MESI/MERSI support[15]

Each core can output up to 4 instructions per clock using superscalar parallelism.

32-bit integer unit

64-bit floating-point (or 2× 32-bit SIMD, often found under the denomination "paired singles")

A total of 3 MB of Level 2 cache in an unusual configuration.[16]

Core 0: 512 KB, core 1: 2 MB, core 2: 512 KB

4-6 stage pipeline

6 Execution Units per core (18 EUs total)

Die size: 4.74 mm × 5.85 mm = 27.73 mm2

___________

Especially the bolded: 

Hector Martin

‏@marcan42

@DFaker no, it's just a 750. PPC750 can issue 3/cycle and retire 2/cycle. @dampflokfreund yes, three Broadways and more cache.



Around the Network

Putting aside core count and clock speed for a moment, what kind of performance boost would likely result from a PPC750 CPU going from 256kb of L2 cache, (Broadway) to 3MB, (Espresso) a 12-fold increase?



curl-6 said:
Putting aside core count and clock speed for a moment, what kind of performance boost would likely result from a PPC750 CPU going from 256kb of L2 cache, (Broadway) to 3MB, (Espresso) a 12-fold increase?


That depends on what the CPU is doing. Interesting for Wii U is that one core has much more L2 cahce than the other two. Its 2MB, 512KB, 512KB.


Cache can speed some things up massively, other not. So you cannot say it gives you X% more speed or something like that.



curl-6 said:
Putting aside core count and clock speed for a moment, what kind of performance boost would likely result from a PPC750 CPU going from 256kb of L2 cache, (Broadway) to 3MB, (Espresso) a 12-fold increase?


A general answer is more L2$ means less io-wait in certain cases but if your data is too big so it doesn't fit it won't matter much. Increasing cache size can of course result in a cpu performing better but this is not generally the case. Especially if you consider it's shared among 3 cores.



walsufnir said:
curl-6 said:
Putting aside core count and clock speed for a moment, what kind of performance boost would likely result from a PPC750 CPU going from 256kb of L2 cache, (Broadway) to 3MB, (Espresso) a 12-fold increase?


A general answer is more L2$ means less io-wait in certain cases but if your data is too big so it doesn't fit it won't matter much. Increasing cache size can of course result in a cpu performing better but this is not generally the case. Especially if you consider it's shared among 3 cores.

Even between 3, it's still 8 times as much as Broadway for the main core, and twice as much for the two secondaries.

So what kind of workloads consist of small data that benefits most from L2 increases?



Around the Network
curl-6 said:
walsufnir said:


A general answer is more L2$ means less io-wait in certain cases but if your data is too big so it doesn't fit it won't matter much. Increasing cache size can of course result in a cpu performing better but this is not generally the case. Especially if you consider it's shared among 3 cores.

Even between 3, it's still 8 times as much as Broadway for the main core, and twice as much for the two secondaries.

So what kind of workloads consist of small data that benefits most from L2 increases?


You can't count it as that as you don't know which core is computing what so it can happen that a core invalidates the data on the cache you want to read and data of one core is overwritten with data from the other core. Therefore algorithms have been introduced but I don't know which one is used in WiiU.

Take a look at this: https://en.wikipedia.org/wiki/Cache_coherence 

In gaming contexts I guess small local (sub-)routines, anything graphics related should be way too big and given that it is CPU cache it's more game logic than fancy graphics stuff.

 

Edit: An interesting part about the first Celeron:

https://en.wikipedia.org/wiki/Celeron#Covington

Intel went cheap and didn't even use a L2$ in this processor :)



Yeah, the Celeron A went with no L2 cache, But the Celeron B with half the cache of a Pentium 2 could in fact match the P2 in most tasks if you overclocked the FSB. Those where actually a beast, especially for their price. I had a 433MHz back then overclocked to 600MHz.

That said, having more cache is always better, though you just can't say something like eight times the cache will give you 20% performance increase or something like that.

Having more cache can highly improve some tasks while others likely wont be affected at all.

As for Espresso, it likely isnt a 970. Its as well no stock 750 PPC. THere where already changes made for Gekko, like some SIMD, FPU...

What might be interesting, Espresso uses 1T eDDRAM cache, Broadway and Gekko used 6T SRAM as cache, which roughly should much more space.

Now, a 45nm triple core broadway should be about 12mm², though it would have less cache.
Leaves pretty much free space for bigger cache, that needs less space though. So, if i where to make a guess Espresso has some slight improvements aside from the cache. that would likely be more SIMD registers and improved FPU. But that is really just a guess.



captain carot said:

As for Espresso, it likely isnt a 970. Its as well no stock 750 PPC. THere where already changes made for Gekko, like some SIMD, FPU...

I go with marcan in this context as he is quite a knowledgable guy when it comes to Wii/WiiU hardware and many things we know about the Espresso is coming from him and his findings. In general it's of course not a stock 750 but derived from it and I guess if he says it's 750 we can likely believe him in this context. 

 

Edit: Concerning the ISA I found this: http://www.radgametools.com/bnkhist.htm

"Added Wii-U support for Bink 2 - play 30 Hz 1080p or 60 Hz 720p video! We didn't think this would be possible - the little non-SIMD CPU that could!" I didn't know it doesn't feature SIMD...



Random guy on internet half a year before PS4 and X1 was announced: "At the time the XBox 360 was created, the PowerPC was the fastest and most reasonable CPU to use. This is no longer the case, where ARM has beat it out. I predict that ARM will be the CPU of choice for the new round of gaming consoles that should be out in the next year or two." Sep 25 '12 http://electronics.stackexchange.com/questions/42065/reasons-for-popularity-of-powerpc-for-embedded-designs



WolfpackN64 said:

The PowerPC architecture saw the light of day in 1992 with the current form Power ISA v2.07 released in 2013. For comparison, x86 started in 1978 with the most recent implementation x86-64 or AMD64 being 2003.

 


The dates you say are mostly right, however contrarily to what's implied here X86 has not stagnated at all since 2003.. back then the AMD Athlon line of CPUs were the X86 CPU with the best TDP while Intel were chasing the extremely high clock speeds with their P4 architecture, they eventually came out with the Core architecture, a derivative of the PIII and since then things improved in therms of TDB...

Since then, Apple stopped using the Power CPUs in their desktop computers, IBM/Freescale stopped focussing on the performance segment for these CPUs (I think they are still used in many set top box or other devices that require little power consumption and performance)... So for a console there is no development in the direction Nintendo or anyone would need the architecture to go, which was not the case just a few years ago, PPC used to mean cutting edge.

It's not to say that the CPUs in the PS4 and XB1 are cutting edge, but it shows where the X86 architecture is now, you have a relatively interesting CPU on the same chip as a good GPU that consumes a decent amount of power and the price is pretty reasonable considering what's in the PS4 at least... PPC does not have that kind of push in R&D, it greatly limit the choices in therms of how the architecture is developped (they target specific niche markets).