NightAntilli said:
Any instruction/calculation regarding loading, integer, branch, store etc |
So, an an SPE can execute 2 instructions simultaneously, but only if one is an "even" instruction and one is an "odd" instruction (assuming the instructions are aligned properly).
SP FP operations on an SPE have a 6 cycle latency on the even pipeline and Integer floating point (such as fma) operations have a 7 cycle latency on the even pipeline. Simple fixed point operations (such as addition, subtractionhave 2 cycle latency on the even pipeline. Compare operations have a 2 cycle latency (even) and branch operations have a 4 cycle latency (odd). Load/store operations have a 6 cycle latency (odd).
All of these instructions are fully pipeline-able without inducing any stalls as long as you double buffer loads.
When people say the SPEs are optimized for SPFP they are talking more about the fact that it's terrible at double precision. The primary inovation of the SPEs is their ability to hide memory access latency. This is the whole point of the crazy architecture and everyone seems to miss it.







