Kynes said:
http://en.wikipedia.org/wiki/Xenos_%28graphics_chip%29 "On the chip, the shader units are organized in three SIMD groups with 16 processors per group, for a total of 48 processors. Each of these processors is composed of a 5-wide vector unit (total 5 FP32 ALUs) that can serially execute up to two instructions per cycle (a multiply and an addition). Thus each of the 48 processors can perform 10 floating-point ops per cycle. All processor in a SIMD group execute the same instruction, so in total up to three instruction threads can be simultaneously under execution." ATI used 5 way-VLIW units. They used XYZWT (X Y Z W being operations on the 4 color values, so they need simpler units, T being transcendental instructions as sin(x) or arctan(x), "harder" to execute so they use beefier units) up to the 6xxx series and the newer APUs, where they changed to 4 way (XYZW simpler units, using all four to calculate the transcendental operations) and then to "scalar" units on GCN (7xxx series) ATI said they used to have approximatelly a 3.8 ocupation on the 5-way units, so most of the time one of the units was idle, that's why they changed to the 4-way units. Then they changed to the scalar units due to the GPGPU programming easiness that this type of architecture provides. |
Thanks for clarification, they state on the same page in specs that it's 4ALUs per shader processor, which does not compute to 240GFLOPS also stated there. However it does say 96 billion shader operations per second (3x16x4ALUsx0.5GHz)...for e6760 I suppose it would be 288 (480 x 0.6GHz), so 3x as much in shader performance.







