@andir
Ok, I will get very specific on differences between the Cell and what is currently a standard shared memory symmetric multi core CPU. Please keep in mind these are not pro and cons but just differences.
There are two different classes of processors with different instruction sets and functionality, the PPE and the SPE's. The PPE is a standard multi core processor, very similar the one in the xbox.
The SPE's are a different breed, designed more like a GPU then a CPU. They key differences form a software design point of view are:
- the SPE cannot directly access anything beyond it's 256k of local memory without DMA directed by the PPE. This 256k has to include both code and data. A caveat they can access each others 256k at high speed so if working on a shared data set they can access up to 1.75MB that they must share.
- The SPE’s cannot direct what code is run on them; they must be scheduled by the PPE.
- The SPE’s can coordinate amongst themselves very quickly and efficiently such that a set of them can be given a complex task to be executed on shared data in parallel or on different data in series.
- SPE’s do not have direct access to external events, e.g. user input, network traffic, disk IO, etc.
At a high level this means you need to design your software up front as two separate systems. One the runs on the PPE as the bootstrap program that can have both game functionality and also has to act as the coordinator for the SPE based system assigning jobs to SPE’s, monitoring them, managing memory needs, managing disk and network IO, etc. Functionality targeted at the SPE’s must be designed into many small independent programs such that it can work within the memory limits and coordinate with code running on the PPE to request additional data and indicate job status. This means you need to be able to break the tasks down into small enough chunks to fit into 256k or be willing to take the hit on the manually memory management required to swap data out. This is fun stuff like only moving memory on 128 byte boundaries and keeping track of what is code and what is data in the 256k on each SPE.
All of this is in contrast to the “standard” CPU of today where each core is identical, has access to all system resources including RAM, disk, network, user input. You can design your system at a high level to simply have one thread per core and hard code what each thread does. You do not need to break your tasks down into small chunks, you do not need to worry about coordinating what code runs where when or how that code accesses memory. You do need to use the standard sempaphore/mutex model to protect shared resources, but there is no need to coordinate your cores at the level the SPE’s need to be.
In addition to the increased software design and development time required for the more complex Cell model there is also the additional test and bug fix time.
Many if the difficulties can be reduced by good middleware but the fundamental design difference of having two interconnected systems will remain.
Now which is “better” is a matter of deciding what is important. If speed of development and minimal barrier to entry is your top concerns then the simpler model is “better”. If your concern is maximum performance per transistor (hardware $) then the more efficient hardware of the Cell is “better”.
Sorry for the long post but I cut it down as much as I could.