By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming Discussion - games use just 10% of ps3's power + CELL benchmarks

staticneuron said: Your a little off on what the SPE's can do. It is a misconception that the PPE has to schedule tasks for the SPE instructions can be sent to the SPE's as a whole without talking to the PPE.
The SPEs do not need individual instructions sent to them, if that's what you're trying to say; they are given tasks which consist of a set of instructions. Typically they will be sent a small bit of code, perhaps a loop, that can be separated out and done quickly by the SPU. The code has to be fast enough to not bog the CPU's primary core (the PPU) down. If the CPU needs the result to continue, and can finish the instructions in less time on the primary core than it takes to push the code to the SPU and obtain the result, it's a net loss in throughput. The SPUs are basically small CPUs, with their own RAM, and the primary core is indeed where they receive their instructions. That's the only place where it can happen, as that's where the context exists. The compilers which are being thrown around (the Octopiler in particular) attempt to automate the process of separating the code out, although how successful they will be remains to be seen. The success of the Cell largely rides on the compiler, otherwise programming for the chip will require a lot of work to get good performance. IBM's going to ensure that it works well, although initial iterations may not be perfect. Again, I highly recommend going over this link, as it goes over the process quite well. http://domino.research.ibm.com/comm/research_projects.nsf/pages/cellcompiler.simd.html The Gamecube and PS2 had, as their primary CPU, a general purpose chip. The PowerPC and MIPS series processors are both general purpose, although they had additional chips to offload some of the work - such as the GPU. The PS2's graphics engine was a bit more complicated to program for, whereas the ATI in the Gamecube for example should be relatively familiar to most graphics programmers. Ditto for the Xbox. That's likely where the complaints you heard came from, although even this can to a certain extent be abstracted away without too much of a performance hit.



Around the Network

I am sorry I didn't see this. I have been busy. Whay my comment was referring to was this.

baka said: The main program accepts the SPU's results, which might be a physics calculation or the like. It then incorporates these results into its own calculations.
The misconception I was talking about was the fact that devs like to treat the spu's like helpers so they are not accessed unless needed. The SPE's can have data sent to them then send the output to any device connected on the EIB or (a method a warhawk developer hinted at) drop data onto the EIB/Memory so the SPE's can process it and put it back to the EIB.



Games make me happy! PSN ID: Staticneuron Gamertag: Staticneuron Wii Code: Static Wii - 3055 0871 5802 1723

staticneuron said: The SPE's can have data sent to them then send the output to any device connected on the EIB or (a method a warhawk developer hinted at) drop data onto the EIB/Memory so the SPE's can process it and put it back to the EIB.
That's true, but there are still a few caveats. The fastest transfers for example would be on the CPU itself, since that's where more of the bandwidth is. Note how the data transfer rate drops in IBM's tests even between relatively distant SPUs on the bus: http://www-128.ibm.com/developerworks/power/library/pa-cellperf/#table1 Of course, the data rate of the EIB is still faster than the RAM connected to the SPU, but this also assumes nothing else is happening on the bus. Other components, which might be under contention, would likely be slower. If the developers are running low on processing power on the primary core this might be useful, or if they're working on a large data set, but generally the small programs that would make the most use of the CPU time would run within the SPU's own RAM - which can be accessed at least twice as fast as any other RAM in the system. Finally, the primary core also needs to be aware of the SPU job's completion, or at least the existence of intermediate output data, even if it doesn't take the result directly. Polling would be inefficient, no doubt an interrupt exists for this.