@ Deneidez
Data processing can normally always be split up in smaller parts and executable code is tiny, results are passed around quickly within the PS3 architecture. For any type of game processing function the code can be styled in a way the Cell is able to well outperform the Xenon. That's one aspect of optimisation.
On the Cell you have local storages of 256KB and DMA per SPE, with the Xenon you have shared L2 cache and a tiny amount of L1 cache to work with per core.
You really need to write down a more concrete example for me to follow you, maybe I can provide you with a smart work around to your problems.