By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:
eyeofcore said:
SNIP

I don't even know where to begin, but your understanding of cache isn't reality.

For starters, more cache isn't slower, it never has been.

Lets take the Pentium 3 Katmai for instance, it had 512kb of L2 cache, however the cache ran at 1/2 to 1/4th the speed of the processor, which was a massive performance bottleneck.

Now, take the Pentium 3 Coppermine, which was an evolutionary improvement over the Katmai core, Intel brought the L2 cache on die, however it was half the amount but ran at the same speed as the processor, which brought with it massive gains in performance, not because of the size reduction but because of the speed. (Doubling/Quaddrupling cache speed has massive advantages.)

Now if you were to take the Athlon 64 3000+ and 3200+ with 512kb of L2 cache against the 3200+ 1024kb of L2 cache, both with identical clockspeeds and everything else, guess which one had the advantage? The 3000+ never managed to beat the 3200+ under any circumstance.

My CPU has 12Mb of total L3 cache, 1.5Mb of L2 cache and 64Kb of L1 cache.
Guess what? The largest cache is also the slowest, the smallest is the fastest, this is by design so that the CPU can use it's predicters to predict the data it needs ahead of time and store it into the L3 cache, then into the L2 cache and then the L1 cache depending how far along the processing train it's in to hide the bandwidth and latency hit of travelling down to system memory.
The more cache, the more the CPU can store in the caches to prevent a cache-miss-hit making the CPU travel all the way down to system memory to grab the data that needs to be processed, that would be a massive amount of CPU cycles on idle going to waste if that happens.

Soon, we will have L4 caches too. (In-fact Intel has it on a couple of CPU's.)

The other advantage of having a cache hierachy is one of cost in terms of transister counts and die size, L1 cache is stupidly expensive, L2 cache is less so but still expensive and L3 is pretty darn cheap in the grand scheme of things, in-fact a massive portion of a CPU die is actually cache.

Using the Wii U's CPU as an example though is a pretty poor one, the Wii U's CPU is old and slow, it's designed to be fabricated cheaply, it's got a below average branch tree predictor amongst other things.
But considering that some Intel CPU's have 140Mb+ of "cache memory" in the form of eDRAM, L1, L2 and L3 caches and considering those would pretty much dominate the paltry Wii U's CPU at the same clock, well. You get the idea.

The reason for shared L2 cache is for coherancy, which brings with it it's own advantages, however the general consensus between Intel and AMD is for the L3 cache to be shared across all cores, whilst the L2 will feed 1-2 cores/threads.

Seriously though, Intel and AMD spend Billions in R&D, they know more than either of us when it comes to cache and they both have the same ideas on what's-what. Nintendo however isn't in the CPU building game and IBM is essentially relegated to last century stuff.

If you want I could go into other parts like the uop, registers and such.

In the end though, it's better to have as much data as you can next to the CPU rather than forcing the CPU to go to System Memory, that's the fundamental reason why cache exists in the first place, more is always better as it's faster and lower latency with better associativity than Ram.


Okay... Thanks for the lesson...

I have a question involving Wii U's CPU, I don't truly agree that is really slow and can you compare it to Xbox 360 and PlayStation 3 CPU?

I know that Wii U's CPU is PowerPC 750CL so it is an old CPU yet I would not underestimate it that easily, it has 4 stage pipeline and that is really short and should have little to no "bubbles" compared to atrocious Xbox 360/PlayStation 3 CPU with their 32 to 40 stage pipeline that are also in order versus out of order that Gamecube/Wii/Wii U CPU is even thought it is kinda limited as I read in some forums.

Is there difference between Xbox 360/PlayStation 3 L2 Cache versus Wii U's L2 Cache that is eDRAM also its configuration that is Core 0 512KB Core 1 2MB Core 2  512KB compared to Unified L2 Cache that is 1MB in Xbox 360 and  768KB in PlayStation 3.

I asked Marcan42 on Twitter if Wii U's CPU can use Wii U's GPU eDRAM pool as L3 Cache and he said it is system RAM and that Espresso can use it for what ever it wants so I assume that it can use it maybe directly or only issue commands?

I read this article and it seems that WIi U's CPU can directly access and use eDRAM from Espresso, maybe I am wrong;

http://hdwarriors.com/general-impression-of-wii-u-edram-explained-by-shinen/

Wii U has DSP while Xbox 360/PlayStation 3 don't have DSP so audio is done on one of their CPU cores? Right? So only two cores are really for game while a third one acts as DSP also the OS is partially ran on one of those cores compared to Wii U that as rumored has 2 ARM cores that are used as "background" cores also there is another ARM core for backward compatibility with Wii so it could also be used.

I know that Xbox 360 had a bottleneck involving RAM, it was GDDR3 with 22.8GB/s yet the FSB or what ever is called that kind of chip could only push 10.8GB/s and PlayStation 3 also had some sort of bottleneck. While Wii U does nto have any kind of bottleneck and uses DDR3 1600mhz so it has 12.8GB/s like most computers nowadays also it has 1GB for games thus it has like almost 3 times more memory for game assets/data to store temporally. DDR3 has much lower latency than GDDR3 so it is great for the OS and games, right?