Entroper said:
I see what you're saying. But in absence of similar tests done on the Xenon, it's not very conclusive as far as drawing a comparison. It's no surprise that MS hasn't published a similar test; the Xenon is not marketed for high performance parallel computing applications. The Xenon cores do have "their own memory" to work with, it's just called "L2 cache" rather than "local store", and its use is transparent rather than explicit. I realize it has less bandwidth and that the GPU can lock it, but it's not as if all three cores and the GPU will be in contention for the bus on each read. You can optimize for cache utilization on a symmetric multi-core system just like you can on an asymmetric one; the difference is that you aren't explicitly starting DMA transfers to the cache in the symmetric system. Project 2 in that same class I told you about earlier was optimizing the matrix multiply for a consumer-level CPU, using exactly this technique. |
The Cell has cache memory as well. However the local memory stores are fully dedicated to each SPE processor and although this memory operates as fast as cache memory it's very different in usage, allowing them to operate independently (on one hardware thread and any number of software threads).







