By using this site, you agree to our Privacy Policy and our Terms of Use. Close
alephnull said:

Yes, but the primary reason is unituitive. It comes from the fact that the 360 shares the memory bank with the video card. To explain I need to go into some background though (sorry if you already know this).

It is vastly more efficient (though technically not required for the cell) for the programmer/compiler to explicitly manage (some aspects) of DMA calls on an SPU because it is composed of two different core-like things with a "division of labor" which execute in parallel. The SPE gets to slosh around in it's 256KB playground while the MFC either very quickly borrows/shares from/with the other SPEs (all the SPEs can read each other's LS with almost no overhead) or grabs things from main memory via it's DMAC. The address space of the LS accessed via real physical addresses and hence, no translations of virtual addresses (there are actually 2 levels of address virtualization on the 360!) are required and so you don't need to cache those translations with a TLB for anything the SPE does.

On a cache coherent system the equivalent of these DMAC calls would happen when a normal load by a core (call it core A) has a cache miss. Since there was a cache miss, that cache has to go out to find the data from a higher level. But what happens if another core (call it core B) already has that address in it's L1 cache (which is always written through to L2 on the 360) and has been messing with it?

You need a way to keep B informed of any changed -- usually by B's cache snooping (intercepting) all reads to that physical address in main memory and L2 cache and broadcasting to A's cache (and every other core's cache) to backoff while it updates the changes. The process of setting this up is a bit involved and while this is being set up noone can access main memory to avoid two caches simulatneously requesting and thinking they are owners of a line.

So what does this have to do with the video card?

Well, on the 360 the video card has the ability read and write directly to main memory and L2 cache! So the 360 has to maintain coherency between all the L1 caches, the L2 cache, main memory, and the caches on the video card itself via the FSB. The video cards tendency to clober mass quatities of data doesn't help either.

 

Thanks for explaining it in such detail. I cannot respond because im completely out of my depth, but thanks!

Btw I wanted to ask you this earlier, if you were designing a next Xbox what kind of CPU would you use? Would you go for the Cell model, something similar to the Xbox 360 CPU or something more akin to your X86 line of desktop CPUs?



Tease.