By using this site, you agree to our Privacy Policy and our Terms of Use. Close
eyeofcore said:
SNIP

I don't even know where to begin, but your understanding of cache isn't reality.

For starters, more cache isn't slower, it never has been.

Lets take the Pentium 3 Katmai for instance, it had 512kb of L2 cache, however the cache ran at 1/2 to 1/4th the speed of the processor, which was a massive performance bottleneck.

Now, take the Pentium 3 Coppermine, which was an evolutionary improvement over the Katmai core, Intel brought the L2 cache on die, however it was half the amount but ran at the same speed as the processor, which brought with it massive gains in performance, not because of the size reduction but because of the speed. (Doubling/Quaddrupling cache speed has massive advantages.)

Now if you were to take the Athlon 64 3000+ and 3200+ with 512kb of L2 cache against the 3200+ 1024kb of L2 cache, both with identical clockspeeds and everything else, guess which one had the advantage? The 3000+ never managed to beat the 3200+ under any circumstance.

My CPU has 12Mb of total L3 cache, 1.5Mb of L2 cache and 64Kb of L1 cache.
Guess what? The largest cache is also the slowest, the smallest is the fastest, this is by design so that the CPU can use it's predicters to predict the data it needs ahead of time and store it into the L3 cache, then into the L2 cache and then the L1 cache depending how far along the processing train it's in to hide the bandwidth and latency hit of travelling down to system memory.
The more cache, the more the CPU can store in the caches to prevent a cache-miss-hit making the CPU travel all the way down to system memory to grab the data that needs to be processed, that would be a massive amount of CPU cycles on idle going to waste if that happens.

Soon, we will have L4 caches too. (In-fact Intel has it on a couple of CPU's.)

The other advantage of having a cache hierachy is one of cost in terms of transister counts and die size, L1 cache is stupidly expensive, L2 cache is less so but still expensive and L3 is pretty darn cheap in the grand scheme of things, in-fact a massive portion of a CPU die is actually cache.

Using the Wii U's CPU as an example though is a pretty poor one, the Wii U's CPU is old and slow, it's designed to be fabricated cheaply, it's got a below average branch tree predictor amongst other things.
But considering that some Intel CPU's have 140Mb+ of "cache memory" in the form of eDRAM, L1, L2 and L3 caches and considering those would pretty much dominate the paltry Wii U's CPU at the same clock, well. You get the idea.

The reason for shared L2 cache is for coherancy, which brings with it it's own advantages, however the general consensus between Intel and AMD is for the L3 cache to be shared across all cores, whilst the L2 will feed 1-2 cores/threads.

Seriously though, Intel and AMD spend Billions in R&D, they know more than either of us when it comes to cache and they both have the same ideas on what's-what. Nintendo however isn't in the CPU building game and IBM is essentially relegated to last century stuff.

If you want I could go into other parts like the uop, registers and such.

In the end though, it's better to have as much data as you can next to the CPU rather than forcing the CPU to go to System Memory, that's the fundamental reason why cache exists in the first place, more is always better as it's faster and lower latency with better associativity than Ram.



--::{PC Gaming Master Race}::--