I suspect all this 4d nonsense they are talking about are just homogeneous coordinates which allow any affine transformation (rotation, translation, dilation, reflection, etc) in R^3 (3d) to be represented as a 4x4 matrix acting on "4d" points.
@Deneidez the cell has some interesting architectural properties when it comes DMAs by the SPEs. If you are familiar with past HPC architectures the best analogy I can think of is vector chaining in the Seymore Cray's machines. Although some people call the SPEs vector cores (I don't bc they don't support chaining), what the real innovation is, is that they support something akin to "DMA chaining" (it's actually way more complicated). Each of the SPEs are only quasi-cache coherent, their load/store commands cannot access address outside their L1 cache and hence each have a separate DMA controller which maintains a large list of future and current needs. This is what makes it fast as hell under the right circumstances and hard to actually program for (separate compiler is used for SPE code). If you want a more thorough explanation I can give you one, but I'm going to stop here as I can already feel people's eyes glazing over.







