Squilliam said:
I've heard of it as something that developers would like to have 'been different' but at the same time does it stand out as a big issue overall? From the surface it looks to me like a fairly well balanced system. |
The architecture is extremely well balanced for a cache-coherent system (read: most architectures you are familar with). Three cores is generally considered the sweet spot in the literature -- i.e the point at which performance loss from bus and cache contention starts to outweigh any gain from the theoretical increase in FP-OPS. Not to mention the fact that each core has 2 sets of 128 SIMD registers compared to the cell PPE's 32. Though not as simple to get decent performance out of as so many seem to believe.
The problem is you have 6 potential threads all competing for main memory access via a single DMA controller and blugeoning the same 1MB of L2 cache run 1/2 clock speed. And since the whole point is to have a simple unified address space to make life easier for developers you have to address translations and take it from me, TLB misses are frequently of the main performance bottleneck and yet are probably one of the most subtle.







