By using this site, you agree to our Privacy Policy and our Terms of Use. Close
selnor said:


...

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

...

I won't comment on the remainder of your post, because I think I made my point yet.

But the bolded part is factually wrong. In-order and out-of-order CPUs are not qualitatively different.

Out-of-order architectures are simply more efficient at instruction-per-clock cycle because they internally reorder the instructions they receive to more efficiently fill their pipelines. But the way you can use them is exactly the same: they can process the same instruction sets etc. For example the Atom CPU line designed for netbooks by Intel is in-order, but you can throw your usual x86 windows code and it will process it happily. At the same clock it will be less powerful than an out-of-order P3, but it's not limited a priori, it simply yields a different computational power per megahertz ratio.

In particular, when you go massively parallel you choose a different investment in circuitry complexity for computation power, prefering adding cores instead of the instruction-reordering circuitry. Having 3 or 7 cores is only the start: Larrabee GPUs will probably start with 32-64 cores.

Look at it this way: out-of-order processing and hyperthreading are actually vestiges of the era when you had a single core, so you went for internal parallelization of microops as much as you could and tried to optimize that. The nature of the x86 instruction set limited this to 3 or 4 internal pipelines and 2 threads, and by what I see with 360 and PS3 the PowerPC instruction set doesn't offer better chances.

As we move that parallelism out to the multi-core, NUMA architectures we can scale this up to tens and hundreds of concurrent operations. It makes sense for the sake of scalability and modularity if the internal complexity of each module is kept to a minimum. The price you pay, of course, is the external complexity of software.

Once again, let me bring up Larrabee as an example: when that kind of CP/GPU becomes the norm there won't be any reason to upgrade the hardware only because new features of DirectX and OpenGL must be implemented and optimized in hardware. It will only take an updated driver to digest those new function calls on the universal CP/GPU: the complexity moves from hardware implementation to software.



"All you need in life is ignorance and confidence; then success is sure." - Mark Twain

"..." - Gordon Freeman