| NJ5 said: I'm not entirely sure, but from what he said, his experience from the Cell is mostly hearsay, talking to developers, and writing some technical articles. He never claimed to be a programmer as far as I've seen. I've also seen a number of claims which make me think he has zero to little experience in programming.
|
Well, you can always find developers that will say the PS3 is amazing and you'll find other devs that say the PS3 is garbage. You can find top tier developers that say both. They could be speaking from experience or there could be other motivation. Some programmers are very much idealists. Some are tolerant.
I have two primary issues with the Cell when compared to the 360's CPU. The first is that the SPE performance is misleading for general gaming. Anyone who has worked with high end computing (especially Cray/SGI top-end machines) knows that peak theoretical numbers are bullshit, and that practical sustained numbers for your specific application are a much better measurement.
If you want to approach the numbers with the Cell, you need an application that can provide a tight pipeline of Floating-Point calculations. With game engines, you often want A*B=C and E*F=D, or something along that line, then C/D=H. The issue here is that you'd need to load two SPEs with your first two calculations, then you'd have to wait many cycles until those SPEs are done *and* you can fetch the result from those SPEs. In the mean while, you may be sitting with 5/7 of your SPEs completely unused and you're unable to use them.
There are things you can do to minimize this, of course, but they require a lot of forethought and that particular performance issue cannot be mitigated 100%. Nor does massive floating point performance gaurantee good performnace for most gaming engines. Even if you could keep the SPEs crunching on pertinent information as opposed to busy work like generating 1000 of some same simple thing just to use the cell.
The second issue I have with the Cell is that they cut it down to a single core with SMT. SMT's benefit depends on the hardware design as well as specific usage examples. The 360 offers a version of SMT as well, and because it's not predictably useful in many circumstances, I will ignore it. This may be a flaw of my analysis.
Consider how game engines render frames. The engine I wrote was fairly simple (although my implementation of Carmack's Reverse was good ;), but the basic ideas are the same. Say your engine has to do the following each frame (please look past the gross simplification):
Deal with input
Calculate physics
Sound
Calculate positions of entities (including animation)
Calculate AI (for the next frame)
Render the frame
Say your target is to spend 50% of your CPU time rendering the frame, and split the rest between calculating physics, dealing with input, calculating positions of entities and AI, etc. Each "big loop" your engine takes, it will end rendering the frame and it will begin with the AI. You also need to buffer at least one frame so you can render over the current frame. The frame being displayed is the frame a user is responding to, but there is a frame being rendered into a buffer that the computer cannot change and will be rendered regardless of how the user responds to the current frame. Therefore, the user's input does not affect the next frame, but rather the frame after that. This is assuming the simplest case of buffering: double buffering. If you're doing VSYNC, you have to wait to display a rendered frame until you sync, so there may be cases when you must wait to begin rendering the frame as well. This is part of the reason it is necessary to render frames fast: faster framerates equal better game experience because the game responds to input more quickly and accurately.
The interesting aspect of having multiple processors and a multithreaded engine is that you can dedicate processors to specific tasks. There can be all kinds of nasty race conditions to worry about, but assuming you plan for multiple processors from the beginning you can avoid many of them. You can dedicate one processor to handling:
Input
Physics
Positions/Animation
You can dedicate another processor to:
Sound
AI
You can dedicate a final processor to:
Rendering
The goal is to keep the third processor rendering 100% time, rather than the 50% time you'd be doing with a single processor. This means your frames take 1/2 as much time to render. You can run processor 1, 2 and 3 all at once, providing the data from processor 1 and 2 to processor 3 after processor 3 finishes the current frame. Your gains come from being able to render twice as quickly.
The Cell originally had two cores and would have been a much better processor for game engines if that was the case. Many of the tasks associated with game engines are searches and cannot be helped with the SPEs.










