By using this site, you agree to our Privacy Policy and our Terms of Use. Close
NJ5 said:
MikeB said:
@ NJ5

PPU has special hardware to improve branch prediction that the SPU doesn't have


Branch hints and branch elimination are the roads to take with regard to the SPUs, the end result is code which if done well runs better on any kind of CPU.

True, but as you say that's something you can also do on the PPU, so Grouch still has to explain why his statement goes against what the Cell experts say.

 

To quote myself:

"Feel free to dig for some info on the details of the "branch predictor" on the PPU, and then re-evaluate your comment, in terms of icache misses, especially since the PPU threads are sharing a cache.

 

If you argued that SPU branches tend to be faster because they never end up in a cache miss, because they SPU code is much smaller, better written, etc. than PPU code tends to be, I would have to agree with you.  That still says nothing about their ability to run general purpose code blazingly fast, especially when doing it in parallel."

 

When it comes down to the fine details, you are absolutely correct, NJ5.  I was angered by your nitpicking, and was reminded that, in a recent optimization, I discovered that some code I had rewritten* (this is the important part) to run on the the SPUs was signifigantly faster than it had been on the PPU.  The branch hinting can be ported back to the PPU, and likely, excepting that my PPU compiler isn't as good as my SPU compiler, the code will be faster there as well, assuming I don't get a load of icache misses with my mispredicted branches.  You could say that the icache expense is really similar to the expense of uploading an entire job to a SPU in the first place, which is, of course, correct.  I've just offloaded the icache hits at the beginning of the job, rather than effectively loading code on-the-fly.  So really, in the end, NJ5 is right, and I am wrong.

Looks like my whole argument is debunked, and this thread can continue along its course of assuming that the SPUs are not independant processors, and totally incapable of running general purpose code decently fast, if at all -- especially since they can't do it in parallel.. you know.. independantly.