Tyrannical said:
The SPEs each have 256K of memory, that's not much. And each SPE can only touch it's own local memory. So memory has to be read from main memory, sent to the SPE, SPE works on it, sent back to main memory. The Cell is best suited for processing continuous and predictable (no branches) data streams, such as video compression or folding at home. The Cell's useful throughput falls apart for just about everything else because of branching, as the SPEs sit idle most of the time starved for data. The Cell would be better suited in a Tivo/DVR. I think CELL tests showed it decompressing 12 HD TV streams at the same time. |
The approach by competent developers of using branch hints and branch elimination results into better performing code than using a predictor can. It will result into better performing code for PC, 360 and other platforms as well. It however requires more effort from developers, implementing this more manually.
The Cell is technically a perfect fit for gaming as well as all kinds of other multi-media uses. With enough effort it's actually powerful at every kind of task, like scientific workloads and procedural synthesis.
256K is both a lot of memory as well as a small amount of memory. Writing 200K of assembly executable code is like writing a bookwork. For executables it's a huge amount of memory to work with. For data processing it's tiny, but the approach with regard to this is entirely different. For this the Cell functions as a stream processor, the data is cut into small pieces and is being processed one piece at a time. The enormous bandwidth is the crucial factor here.