NJ5 said:
That's not true. There are some specific ways in which the SPUs are inherently slower than the PPU, for example double-precision arithmetic. Each SPU is capable of executing two DP instructions every seven cycles. With Fused-Multiply-Add, an SPU can achieve a peak 1.83GFLOPS at 3.2GHz. With eight SPUs and fully pipelined DP floating-point support in the PPE's VMX, the Cell BE is capable of a peak 21.03GFLOPS DP floating-point
Do the math and you can see the PPU is 3.5 times faster than the SPU at that. Source: http://www-128.ibm.com/developerworks/power/library/pa-cellperf/
|
There may be a few operations where a 3.2 GHz SPU is slower than a 1.6 GHz PPU thread, but in terms of general-purpose programming, its negligable. Even branches tend to be faster on the SPUs, and that's practically the definition of "general-purpose" programming expense.