By using this site, you agree to our Privacy Policy and our Terms of Use. Close
WereKitten said:
selnor said:

I agree. He even stated that good devs would work around the small cache.

No denying though if both ps3 and 360 had out of order processors, they would be much more efficient and faster. As the quote in his article from Romaro states.

Maybe next gen we will see OOO Processors.

That L2 cache is not as important as in other architectures, as each SPE can rely on DMA and SPEs communicate between them through a bus. It means having to manage memory access at a very low level, but it can be highly optimized. Some extra effort is also needed on the 360 to avoid L1 cache contention troubles (the two threads of each of the 3 cores share the L1 cache), so in general memory management has been a critical issue in this generation of consoles.

As to the bolded, the trend seems to be the opposite.

In-order processors are simpler and cheaper, and they allow for a greater number of concurrent cores per die. It's not an accident that such a design choice was made by IBM, and the same was done by Intel with Larrabee sharing a lot of design ideas with Cell - I think that's where the GPU/CPU hybrids will go next.

yup..as a good example the cell "TRE" or "Terrain Rendering Engine" that was shown back in 2005 on the cell

was able to get 30 fps without even going through a GPU just off the Cell alone.

another example:

IN CT reconstruction:

Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA)
Scherl, H.   Keck, B.   Kowarschik, M.   Hornegger, J.  
Friedrich-Alexander- Univ. Erlangen-Nurnberg, Erlangen;

This paper appears in: Nuclear Science Symposium Conference Record, 2007. NSS '07. IEEE
Publication Date: Oct. 26 2007-Nov. 3 2007
Volume: 6,  On page(s): 4464-4466
Location: Honolulu, HI,
ISSN: 1082-3654
ISBN: 978-1-4244-0922-8
INSPEC Accession Number: 9892023
Digital Object Identifier: 10.1109/NSSMIC.2007.4437102
Current Version Published: 2008-01-22

Abstract
The Common Unified Device Architecture (CUDA) is a fundamentally new programming approach making use of the

of the most current Graphics Processing Units (CPUs) from NVIDIA. The programming interface allows to implement an algorithm using standard C language and a few extensions without any knowledge about graphics programming using OpenGL, DirectX, and shading languages. We apply this revolutionary new technology to the FDK method, which solves the three-dimensional reconstruction task in cone-beam CT. The computational complexity of this algorithm prohibits its use for many medical applications without hardware acceleration. Today's CPUs with their high level of parallelism are cost-efficient processors for performing the FDK reconstruction according to medical requirements. In this paper, we present an innovative implementation of the most time-consuming parts of the FDK algorithm: filtering and back-projection. We also explain the required transformations to parallelize the algorithm for the CUDA architecture. Our implementation approach further allows to do an on-the-fly- reconstruction, which means that the reconstruction is completed right after the end of data acquisition. This enables us to present the reconstructed volume to the physician in real-time, immediately after the last projection image has been acquired by the scanning device. Finally, we compare our results to our highly optimized FDK implementation on the Cell Broadband Engine Architecture (CBEA), both with respect to reconstruction speed and implementation effort.

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&arnumber=4437102&isnumber=4437000

IE: unified shader's done on the Cell processor...



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.