By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming - A Factual Article everyone should read.

joeorc said:
selnor said:
joeorc said:

@selnor
"But at the end of the day it still comes back to spreading 512kb L2 Cache between 6 SPE's and 1PPE. As well as having to deal with each instruction in order rather than like a PC CPU which can deal with any code it needs to."

i read this article back in 2006.

your statement only proves how outdated this article is..for one the SPE's do not just rely on the PPE for instruction, because the SPE's have their own instruction's seperate for the PPE. the SPE's can do direct DMA to and from other SPE's with out the need of the PPE.because the SPE's have their own local store.

here is some fact's for people:

users.ece.gatech.edu/~lanterma/mpg/ece4893_xbox360_vs_ps3.pdf

real world tests....

http://www.ibm.com/developerworks/power/library/pa-cellperf/

@selnor

to me this shows his take has many faults on what you describe as.."facts" about the ps3 though i respect his opinion i do no agree with it. like other's may not agree with mine.


IT doesnt matter when an article was written, the componenets of a machine dont change. That IBM article you posted is what the tech specialist used and has as a source on the final page of his disection. Yes they are tests. But tests are done in a controlled environment. Notice how on IBM's very own graphs they show they can actual only get 75% of the theoretical power when using all SPE's at once?

That IBM article completely backs up this tech persons article. Theoretical PS3 can do 218 GFLOPS with the cell. Actual game environment will be closer to 70 GFLOPS in the best games of this gen. For some reason Cell loses alot of peak ower when all SPE's are being used. 360's Xenon will likely see around 60 GFLOPS peak for actual games. But again Cell can help out The weak RSX where it needs to. And Xenos on 360 has the ability to help out Xenon CPU. Because the Xenos is very much more advanced than RSX. It's catch 20/20.

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

The only part that can change in from the article is the OS usage. How much ram and how much processing time. But thats it. The rest will never change, but developers will learn ways like they did last gen to overcome any hiccups. And g=create great games like Killzone 2, or Forza 3. It's the nature of the beast.

The main points I got from the article is that in no way Is PS3 or 360 supercomputers. Also PS2 had a better CPU than Xbox 1, yet Xbox 1 provided greater graphics. How advanced the 360 GPU actually is. And how the gap this gen between the 2 are closer than the gap between PS2 and Xbox 1.

You have to remember that the hard facts about machines dont change. The specs dont change, how out of order CPUs work dont change, how memory works doesn't change and the article even talks about how developers can use the SPE's to their advantage or not.

Lets not discredit factual information because it hurts our fanboyism. I have no problem in admitting I was wrong that PS3 Cell wasn't that powerful. But like the article says using SPE's for things like cloth motion etc means the cell can display more on screen than the xenon.

But as he points out on a whole the machines are so much closer than last gen, becasue RSX is helped by Cell and Xenon is helped by Xenos. It's like M$ really went for awesome graphics chip and Sony went for awesome CPU. And in the thick of it both machines can use the stronger chip to help out their weaker chip to overcome an shortcomings. It's funny really. But thats life.

It's also worth noting that multithreaded engines like UE3 are nowhere near fully optimized multithreading code. this can take even another 3 or 4 years to perfect.

once again i respect this guy's Opinion based on what he has read about the Cell but i have to disagree with his take on how that technology is used , based on his experience of the Cell. for instance you say that the IBM test's are in controlled enviroment...but on the same token back in 2005 the API's were not as well developed for the Cell as they are now, the very fact that even back in 2005 they were getting 30 fps TRE on the same CPU that is in the PS3 and this was shown working at the trade show points to the fact that his claim's are based on his OPINION on what data he had on hand. which he did not have all the data of each system. he just had what he had, and he made his Opinion known on his blog based on that data.

 

I do not fault him for it other than its way outdated.

I have no problem in you posting what you did because it give's you and idea what this person's experience with what he has about both platform's,

Me i do not tend to only look at just one article based on 2005, or 2006 to say this is fact's if the person has not done any test's themselves., does it mean he's 100% wrong or right depends on who you ask. me i see some of the thing's he did get right, on the ps3 but his take on how to impliment the development process is wrong, which would give much more results than what he has stated. Like i said i am not knocking his take due to what info he only had on hand at the time.

i am not knocking it being posted i am just knocking on it's relevance today.

EXAMPLE:

IBM has done test's on both the xbox360 processor and the Cell processor both are very powerful processor's but even IBM which created both processor's has stated the Cell canreach close to its maximum Flops compared to the xbox360's. this does not mean much anyway because it's upto the developer and his experience with the hardware.

did you ever think the Cell processor can do unified shader''s but it can.



I understand what your saying, but he even describes pretty much how KZ2 was made back in 2006. The cell can do Unified shaders but when it's doing this it can only work on less normal CPU things. RSX has 8 US and Xenos has 48, so the cell has to help RSX to get close to Xenos Unified shaders. But it's all catch 20/20. Because Xenos GPU in 360 is capable of performing CPU intensive tasks. Like AI, or physics.

So on one hand you have Cell helping RSX with Grpahics calculations. And on the other you have Xenos helping Xenon with CPU calcultaions.

At your example. That is 100% false.

Taken directly from IBM's full test of Cell running 8 SPU's

Table 4. Performance of parallelized Linpack on eight SPUs

Matrix size Cycles # of Insts. CPI Single Issue Dual Issue Channel Stalls Other Stalls # of Used Regs SPEsim Mea- sured Model accuracy Effi- ciency
1024x1024 27.6M 2.92M 0.95 27.9% 32.6% 26.9% 12.6% 126 83.12 73.04 87.87% 35.7%
4096x4096 918.0M 1.51G 0.61 29.0% 56.7% 10.8% 3.4% 126 160 155.5 97.2% 75.9%

Notice the model accuracy is 97.2% and the efficiency is 75.9% of the theorectical Peak GFLOPS performance. Doing the math brings down the GFLOPS in a contolled environment test to 150 odd GFLOPS. Take away 1 SPE which is not used for games and another which is dorment for OS and we are closer to 120 GFLOPS. Again 120 GFLOPS is in a controlled environment and not in an unstable game code environment.

This is directly from the link you posted in this thread. These figures will never change, unless Sony changes the CPU in the PS3. And that doesn't happen to consoles. Dont get me wrong both consoles are powerful but not nearly as powerful as PR BS portrays. It's like TRE. You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell.

Likewise the same is said for 360. There is just a different set of boundaries for that machine.



Around the Network
WereKitten said:
selnor said:


...

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

...

I won't comment on the remainder of your post, because I think I made my point yet.

But the bolded part is factually wrong. In-order and out-of-order CPUs are not qualitatively different.

Out-of-order architectures are simply more efficient at instruction-per-clock cycle because they internally reorder the instructions they receive to more efficiently fill their pipelines. But the way you can use them is exactly the same: they can process the same instruction sets etc. For example the Atom CPU line designed for netbooks by Intel is in-order, but you can throw your usual x86 windows code and it will process it happily. At the same clock it will be less powerful than an out-of-order P3, but it's not limited a priori, it simply yields a different computational power per megahertz ratio.

In particular, when you go massively parallel you choose a different investment in circuitry complexity for computation power, prefering adding cores instead of the instruction-reordering circuitry. Having 3 or 7 cores is only the start: Larrabee GPUs will probably start with 32-64 cores.

Look at it this way: out-of-order processing and hyperthreading are actually vestiges of the era when you had a single core, so you went for internal parallelization of microops as much as you could and tried to optimize that. The nature of the x86 instruction set limited this to 3 or 4 internal pipelines and 2 threads, and by what I see with 360 and PS3 the PowerPC instruction set doesn't offer better chances.

As we move that parallelism out to the multi-core, NUMA architectures we can scale this up to tens and hundreds of concurrent operations. It makes sense for the sake of scalability and modularity if the internal complexity of each module is kept to a minimum. The price you pay, of course, is the external complexity of software.

Once again, let me bring up Larrabee as an example: when that kind of CP/GPU becomes the norm there won't be any reason to upgrade the hardware only because new features of DirectX and OpenGL must be implemented and optimized in hardware. It will only take an updated driver to digest those new function calls on the universal CP/GPU: the complexity moves from hardware implementation to software.

yup..it's about less strain and latency through out the system as a whole by implimenting CP/GPU hybid chips there is less latency per/clock cycle, less power needed per clock cycle to gathter better results in this type of design. just look at the IO connection speed compared to PC in both the xbox360 an the PS3 compared to the PC's io connection speed. the design's very optimised for the task that's needed . so more systems can be designed along those lines they are cheaper an more efficient

your system does not have to do everything better . IT just has to do what its designed to do well. Which both the xbox360 and ps3 does and that's game's



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.

selnor said:
joeorc said:
selnor said:
joeorc said:

@selnor
"But at the end of the day it still comes back to spreading 512kb L2 Cache between 6 SPE's and 1PPE. As well as having to deal with each instruction in order rather than like a PC CPU which can deal with any code it needs to."

i read this article back in 2006.

your statement only proves how outdated this article is..for one the SPE's do not just rely on the PPE for instruction, because the SPE's have their own instruction's seperate for the PPE. the SPE's can do direct DMA to and from other SPE's with out the need of the PPE.because the SPE's have their own local store.

here is some fact's for people:

users.ece.gatech.edu/~lanterma/mpg/ece4893_xbox360_vs_ps3.pdf

real world tests....

http://www.ibm.com/developerworks/power/library/pa-cellperf/

@selnor

to me this shows his take has many faults on what you describe as.."facts" about the ps3 though i respect his opinion i do no agree with it. like other's may not agree with mine.


IT doesnt matter when an article was written, the componenets of a machine dont change. That IBM article you posted is what the tech specialist used and has as a source on the final page of his disection. Yes they are tests. But tests are done in a controlled environment. Notice how on IBM's very own graphs they show they can actual only get 75% of the theoretical power when using all SPE's at once?

That IBM article completely backs up this tech persons article. Theoretical PS3 can do 218 GFLOPS with the cell. Actual game environment will be closer to 70 GFLOPS in the best games of this gen. For some reason Cell loses alot of peak ower when all SPE's are being used. 360's Xenon will likely see around 60 GFLOPS peak for actual games. But again Cell can help out The weak RSX where it needs to. And Xenos on 360 has the ability to help out Xenon CPU. Because the Xenos is very much more advanced than RSX. It's catch 20/20.

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

The only part that can change in from the article is the OS usage. How much ram and how much processing time. But thats it. The rest will never change, but developers will learn ways like they did last gen to overcome any hiccups. And g=create great games like Killzone 2, or Forza 3. It's the nature of the beast.

The main points I got from the article is that in no way Is PS3 or 360 supercomputers. Also PS2 had a better CPU than Xbox 1, yet Xbox 1 provided greater graphics. How advanced the 360 GPU actually is. And how the gap this gen between the 2 are closer than the gap between PS2 and Xbox 1.

You have to remember that the hard facts about machines dont change. The specs dont change, how out of order CPUs work dont change, how memory works doesn't change and the article even talks about how developers can use the SPE's to their advantage or not.

Lets not discredit factual information because it hurts our fanboyism. I have no problem in admitting I was wrong that PS3 Cell wasn't that powerful. But like the article says using SPE's for things like cloth motion etc means the cell can display more on screen than the xenon.

But as he points out on a whole the machines are so much closer than last gen, becasue RSX is helped by Cell and Xenon is helped by Xenos. It's like M$ really went for awesome graphics chip and Sony went for awesome CPU. And in the thick of it both machines can use the stronger chip to help out their weaker chip to overcome an shortcomings. It's funny really. But thats life.

It's also worth noting that multithreaded engines like UE3 are nowhere near fully optimized multithreading code. this can take even another 3 or 4 years to perfect.

once again i respect this guy's Opinion based on what he has read about the Cell but i have to disagree with his take on how that technology is used , based on his experience of the Cell. for instance you say that the IBM test's are in controlled enviroment...but on the same token back in 2005 the API's were not as well developed for the Cell as they are now, the very fact that even back in 2005 they were getting 30 fps TRE on the same CPU that is in the PS3 and this was shown working at the trade show points to the fact that his claim's are based on his OPINION on what data he had on hand. which he did not have all the data of each system. he just had what he had, and he made his Opinion known on his blog based on that data.

 

I do not fault him for it other than its way outdated.

I have no problem in you posting what you did because it give's you and idea what this person's experience with what he has about both platform's,

Me i do not tend to only look at just one article based on 2005, or 2006 to say this is fact's if the person has not done any test's themselves., does it mean he's 100% wrong or right depends on who you ask. me i see some of the thing's he did get right, on the ps3 but his take on how to impliment the development process is wrong, which would give much more results than what he has stated. Like i said i am not knocking his take due to what info he only had on hand at the time.

i am not knocking it being posted i am just knocking on it's relevance today.

EXAMPLE:

IBM has done test's on both the xbox360 processor and the Cell processor both are very powerful processor's but even IBM which created both processor's has stated the Cell canreach close to its maximum Flops compared to the xbox360's. this does not mean much anyway because it's upto the developer and his experience with the hardware.

did you ever think the Cell processor can do unified shader''s but it can.



I understand what your saying, but he even describes pretty much how KZ2 was made back in 2006. The cell can do Unified shaders but when it's doing this it can only work on less normal CPU things. RSX has 8 US and Xenos has 48, so the cell has to help RSX to get close to Xenos Unified shaders. But it's all catch 20/20. Because Xenos GPU in 360 is capable of performing CPU intensive tasks. Like AI, or physics.

So on one hand you have Cell helping RSX with Grpahics calculations. And on the other you have Xenos helping Xenon with CPU calcultaions.

At your example. That is 100% false.

Taken directly from IBM's full test of Cell running 8 SPU's

Table 4. Performance of parallelized Linpack on eight SPUs

Matrix size Cycles # of Insts. CPI Single Issue Dual Issue Channel Stalls Other Stalls # of Used Regs SPEsim Mea- sured Model accuracy Effi- ciency
1024x1024 27.6M 2.92M 0.95 27.9% 32.6% 26.9% 12.6% 126 83.12 73.04 87.87% 35.7%
4096x4096 918.0M 1.51G 0.61 29.0% 56.7% 10.8% 3.4% 126 160 155.5 97.2% 75.9%

Notice the model accuracy is 97.2% and the efficiency is 75.9% of the theorectical Peak GFLOPS performance. Doing the math brings down the GFLOPS in a contolled environment test to 150 odd GFLOPS. Take away 1 SPE which is not used for games and another which is dorment for OS and we are closer to 120 GFLOPS. Again 120 GFLOPS is in a controlled environment and not in an unstable game code environment.

This is directly from the link you posted in this thread. These figures will never change, unless Sony changes the CPU in the PS3. And that doesn't happen to consoles. Dont get me wrong both consoles are powerful but not nearly as powerful as PR BS portrays. It's like TRE. You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell.

Likewise the same is said for 360. There is just a different set of boundaries for that machine.

an no my example is not false..reason being IBM created the chips they have done real world test's. does it mean the performance of engines on the xbox360 cannot attain higher performance. no it does not but on the same token that same thing can be said about the PS3..its the software engine, the technology it is just a rough guide what could be able to attain. will they reach their maximum's ..prob not . but it does not take away from the fact that IBM stated what they did due to the test's they have done on both systems processor's.

once again the API's was not as well developed in 2005, because the Cell was just unvailed in 2004 that's like saying the engine will stay the same for any processor..yea unless you tweak the engine which we all know EPIC, or other developer's do not tweak their engine's..come on man your reachin.

example:

you just stated this:

@selnor

"You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell."

yes it can..and that's where you are going about it all wrong.

"staude" pointed this very same thing out to you in this very same thread. what you relate to PC programming is not the way you look at these type's of local store embedded cpu core's and how they can be developed on. "you can" but your result's will be reduced . this is more about memory management than about trying to rely on just a large pool of ram to do everything from. it's a much more indepth precise way of development with this type of design. because there is more seperate core's to manage not just on memory, but also what each core will need to do in any clock cycle.

but that also has problems of it's own:

ever heard of

"differential signaling"

example :

The realism of today’s games, though, demands far more number crunching than the CPU alone can deliver. That’s where the graphics processing unit, or GPU, comes in. Every time an object has to be rendered on screen, the CPU sends information about that object to the GPU, which then performs more specialized types of calculations to create the volume, motion, and lighting of the object.

But despite churning through billions of floating-point math operations per second, or flops, today’s gaming systems and PCs still can’t deliver the realism that game developers seek. CPUs, memories, and GPUs just aren’t powerful enough—or can’t exchange data fast enough—to handle the complexity and richness of the games designers want to create. In other words, hardware constraints force them to reduce the number of objects in scenes, their motion and texture, and the quality of special effects.

The need for speed becomes even more critical in the PS3, whose Cell processor is actually nine processors on a single silicon die. In the Cell, one processor, or core, divides up work for the other eight cores, which were designed to stream through computation-intensive workloads like video processing, content decryption, and physics calculations. [For more on the Cell chip, see “Multimedia Monster,” IEEE Spectrum, January 2006.] Using all its cores, the 3.2-gigahertz Cell processor can deliver a whopping 192 billion single-precision flops. Without a speedy connection to the PS3’s memory, the Cell starves for data.

To speed up data transfers between the Cell processor and its memory chips, the PS3’s designers adopted a novel memory system architecture that, Rambus says, addresses some of the limitations of current DRAMs [see “How the PlayStation 3 Shuttles Bits” To understand how these limitations came about, consider first the co-evolution of microprocessors and memory.

Moore’s Law tells us that transistor densities on chips are doubling every 18 months or so. This evolution has been accompanied by a doubling, on a similar time scale, in the clock rates of processor chips, basically because smaller transistors can toggle on and off faster. But memory clock rates, which serve as an indicator of memory data-transfer rates, are doubling much more slowly—about every 10 years. The result is that memory can’t fetch data to the processor fast enough, a bandwidth bottleneck that has increasingly constricted over the past few decades.

The bandwidth gap is just part of the problem. The other part is related to latency, the time it takes the memory to produce a chunk of data requested by the processor. This delay can vary from tens to hundreds of nanoseconds. That may not seem like much, but in a mere 50 nanoseconds a 3.8-GHz processor can go through 190 clock cycles. “You don’t want the processor waiting for that long,” says Brian T. Davis, a professor of electrical and computer engineering at the Michigan Technological University, in Houghton. The latency problem prompted chip makers years ago to embed some DRAM caches directly onto CPU chips, as well as to concoct some processing tricks to keep the wait for data as short as possible. Despite these improvements, modern CPUs can spend more than half their time—and often much more, Davis notes—just waiting for data to come from memory.

the PS3, the Cell and the RSX are connected by a Rambus interface technology, which, sure enough, Rambus has given a catchy name—FlexIO. The total bus width between the two chips is 7 bytes: 4 bytes to move data from the Cell to the RSX, 3 to move data in the opposite direction. This setting gives a bandwidth of 20 GB/s outbound from the Cell and 15 GB/s inbound—almost 10 times as fast as PCI Express, an interface standard popular in today’s PCs. Thanks to FlexIO, the Cell processor can fling an incredibly large number of triangles to the RSX, and the result is more details and more objects in 3-D games.

Future Gaming consoles will continue to demand ever-faster buses, but how much bandwidth is enough will vary from system to system. For instance, one of PlayStation 3’s main competitors, Microsoft’s Xbox 360, released last year, relies on a CPU-GPU bus with a bandwidth of 21.6 GB/s, half in each direction. It’s a proprietary interface developed by IBM that runs at 5.4 GHz and relies on differential signaling to maintain signal integrity. It may not be as fast as PS3’s, but Xbox 360 owners don’t seem disappointed.

In fact, just because PS3 has more powerful data pipes, that doesn’t mean its games will easily get the full benefit from them. As in any other computer system, software, not just hardware, matters. Game developers will have to design their code carefully to make sure that the Cell is getting the types of workloads for which it works best, and that data are streaming smoothly between processor, memory, and GPU.

so as you can see , it not just about the Cell, or the xenon or their respective GPU's its about the system as a whole and your article failed to even go into that part of his take on each system. Now like i said it's not about his take it's about its Relevance today.


http://www.spectrum.ieee.org/images/...ges/gamef1.pdf

 



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.

@joeorc

Ummm...what?! Yea i was better off reading the 11 page article than your comments.

At the end of the day, despite what developers do, and how software changes...the HARDWARE remains the same. selnor tried to make that point at the outset of this post, but seems your just in this for arguments sake.



Follow Me: twitter.com/alkamiststar

Watch Me: youtube.com/alkamiststar

Play Along: XBL & SEN : AlkamistStar

This article is ancient history and its "facts"/"theories" have been debunked elsewhere. I just don't have the time to go searching for that info.



Around the Network
joeorc said:
selnor said:
joeorc said:
selnor said:
joeorc said:

@selnor
"But at the end of the day it still comes back to spreading 512kb L2 Cache between 6 SPE's and 1PPE. As well as having to deal with each instruction in order rather than like a PC CPU which can deal with any code it needs to."

i read this article back in 2006.

your statement only proves how outdated this article is..for one the SPE's do not just rely on the PPE for instruction, because the SPE's have their own instruction's seperate for the PPE. the SPE's can do direct DMA to and from other SPE's with out the need of the PPE.because the SPE's have their own local store.

here is some fact's for people:

users.ece.gatech.edu/~lanterma/mpg/ece4893_xbox360_vs_ps3.pdf

real world tests....

http://www.ibm.com/developerworks/power/library/pa-cellperf/

@selnor

to me this shows his take has many faults on what you describe as.."facts" about the ps3 though i respect his opinion i do no agree with it. like other's may not agree with mine.


IT doesnt matter when an article was written, the componenets of a machine dont change. That IBM article you posted is what the tech specialist used and has as a source on the final page of his disection. Yes they are tests. But tests are done in a controlled environment. Notice how on IBM's very own graphs they show they can actual only get 75% of the theoretical power when using all SPE's at once?

That IBM article completely backs up this tech persons article. Theoretical PS3 can do 218 GFLOPS with the cell. Actual game environment will be closer to 70 GFLOPS in the best games of this gen. For some reason Cell loses alot of peak ower when all SPE's are being used. 360's Xenon will likely see around 60 GFLOPS peak for actual games. But again Cell can help out The weak RSX where it needs to. And Xenos on 360 has the ability to help out Xenon CPU. Because the Xenos is very much more advanced than RSX. It's catch 20/20.

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

The only part that can change in from the article is the OS usage. How much ram and how much processing time. But thats it. The rest will never change, but developers will learn ways like they did last gen to overcome any hiccups. And g=create great games like Killzone 2, or Forza 3. It's the nature of the beast.

The main points I got from the article is that in no way Is PS3 or 360 supercomputers. Also PS2 had a better CPU than Xbox 1, yet Xbox 1 provided greater graphics. How advanced the 360 GPU actually is. And how the gap this gen between the 2 are closer than the gap between PS2 and Xbox 1.

You have to remember that the hard facts about machines dont change. The specs dont change, how out of order CPUs work dont change, how memory works doesn't change and the article even talks about how developers can use the SPE's to their advantage or not.

Lets not discredit factual information because it hurts our fanboyism. I have no problem in admitting I was wrong that PS3 Cell wasn't that powerful. But like the article says using SPE's for things like cloth motion etc means the cell can display more on screen than the xenon.

But as he points out on a whole the machines are so much closer than last gen, becasue RSX is helped by Cell and Xenon is helped by Xenos. It's like M$ really went for awesome graphics chip and Sony went for awesome CPU. And in the thick of it both machines can use the stronger chip to help out their weaker chip to overcome an shortcomings. It's funny really. But thats life.

It's also worth noting that multithreaded engines like UE3 are nowhere near fully optimized multithreading code. this can take even another 3 or 4 years to perfect.

once again i respect this guy's Opinion based on what he has read about the Cell but i have to disagree with his take on how that technology is used , based on his experience of the Cell. for instance you say that the IBM test's are in controlled enviroment...but on the same token back in 2005 the API's were not as well developed for the Cell as they are now, the very fact that even back in 2005 they were getting 30 fps TRE on the same CPU that is in the PS3 and this was shown working at the trade show points to the fact that his claim's are based on his OPINION on what data he had on hand. which he did not have all the data of each system. he just had what he had, and he made his Opinion known on his blog based on that data.

 

I do not fault him for it other than its way outdated.

I have no problem in you posting what you did because it give's you and idea what this person's experience with what he has about both platform's,

Me i do not tend to only look at just one article based on 2005, or 2006 to say this is fact's if the person has not done any test's themselves., does it mean he's 100% wrong or right depends on who you ask. me i see some of the thing's he did get right, on the ps3 but his take on how to impliment the development process is wrong, which would give much more results than what he has stated. Like i said i am not knocking his take due to what info he only had on hand at the time.

i am not knocking it being posted i am just knocking on it's relevance today.

EXAMPLE:

IBM has done test's on both the xbox360 processor and the Cell processor both are very powerful processor's but even IBM which created both processor's has stated the Cell canreach close to its maximum Flops compared to the xbox360's. this does not mean much anyway because it's upto the developer and his experience with the hardware.

did you ever think the Cell processor can do unified shader''s but it can.



I understand what your saying, but he even describes pretty much how KZ2 was made back in 2006. The cell can do Unified shaders but when it's doing this it can only work on less normal CPU things. RSX has 8 US and Xenos has 48, so the cell has to help RSX to get close to Xenos Unified shaders. But it's all catch 20/20. Because Xenos GPU in 360 is capable of performing CPU intensive tasks. Like AI, or physics.

So on one hand you have Cell helping RSX with Grpahics calculations. And on the other you have Xenos helping Xenon with CPU calcultaions.

At your example. That is 100% false.

Taken directly from IBM's full test of Cell running 8 SPU's

Table 4. Performance of parallelized Linpack on eight SPUs

Matrix size Cycles # of Insts. CPI Single Issue Dual Issue Channel Stalls Other Stalls # of Used Regs SPEsim Mea- sured Model accuracy Effi- ciency
1024x1024 27.6M 2.92M 0.95 27.9% 32.6% 26.9% 12.6% 126 83.12 73.04 87.87% 35.7%
4096x4096 918.0M 1.51G 0.61 29.0% 56.7% 10.8% 3.4% 126 160 155.5 97.2% 75.9%

Notice the model accuracy is 97.2% and the efficiency is 75.9% of the theorectical Peak GFLOPS performance. Doing the math brings down the GFLOPS in a contolled environment test to 150 odd GFLOPS. Take away 1 SPE which is not used for games and another which is dorment for OS and we are closer to 120 GFLOPS. Again 120 GFLOPS is in a controlled environment and not in an unstable game code environment.

This is directly from the link you posted in this thread. These figures will never change, unless Sony changes the CPU in the PS3. And that doesn't happen to consoles. Dont get me wrong both consoles are powerful but not nearly as powerful as PR BS portrays. It's like TRE. You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell.

Likewise the same is said for 360. There is just a different set of boundaries for that machine.

an no my example is not false..reason being IBM created the chips they have done real world test's. does it mean the performance of engines on the xbox360 cannot attain higher performance. no it does not but on the same token that same thing can be said about the PS3..its the software engine, the technology it is just a rough guide what could be able to attain. will they reach their maximum's ..prob not . but it does not take away from the fact that IBM stated what they did due to the test's they have done on both systems processor's.

once again the API's was not as well developed in 2005, because the Cell was just unvailed in 2004 that's like saying the engine will stay the same for any processor..yea unless you tweak the engine which we all know EPIC, or other developer's do not tweak their engine's..come on man your reachin.

example:

you just stated this:

@selnor

"You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell."

yes it can..and that's where you are going about it all wrong.

"staude" pointed this very same thing out to you in this very same thread. what you relate to PC programming is not the way you look at these type's of local store embedded cpu core's and how they can be developed on. "you can" but your result's will be reduced . this is more about memory management than about trying to rely on just a large pool of ram to do everything from. it's a much more indepth precise way of development with this type of design. because there is more seperate core's to manage not just on memory, but also what each core will need to do in any clock cycle.

but that also has problems of it's own:

ever heard of

"differential signaling"

example :

The realism of today’s games, though, demands far more number crunching than the CPU alone can deliver. That’s where the graphics processing unit, or GPU, comes in. Every time an object has to be rendered on screen, the CPU sends information about that object to the GPU, which then performs more specialized types of calculations to create the volume, motion, and lighting of the object.

But despite churning through billions of floating-point math operations per second, or flops, today’s gaming systems and PCs still can’t deliver the realism that game developers seek. CPUs, memories, and GPUs just aren’t powerful enough—or can’t exchange data fast enough—to handle the complexity and richness of the games designers want to create. In other words, hardware constraints force them to reduce the number of objects in scenes, their motion and texture, and the quality of special effects.

The need for speed becomes even more critical in the PS3, whose Cell processor is actually nine processors on a single silicon die. In the Cell, one processor, or core, divides up work for the other eight cores, which were designed to stream through computation-intensive workloads like video processing, content decryption, and physics calculations. [For more on the Cell chip, see “Multimedia Monster,” IEEE Spectrum, January 2006.] Using all its cores, the 3.2-gigahertz Cell processor can deliver a whopping 192 billion single-precision flops. Without a speedy connection to the PS3’s memory, the Cell starves for data.

To speed up data transfers between the Cell processor and its memory chips, the PS3’s designers adopted a novel memory system architecture that, Rambus says, addresses some of the limitations of current DRAMs [see “How the PlayStation 3 Shuttles Bits” To understand how these limitations came about, consider first the co-evolution of microprocessors and memory.

Moore’s Law tells us that transistor densities on chips are doubling every 18 months or so. This evolution has been accompanied by a doubling, on a similar time scale, in the clock rates of processor chips, basically because smaller transistors can toggle on and off faster. But memory clock rates, which serve as an indicator of memory data-transfer rates, are doubling much more slowly—about every 10 years. The result is that memory can’t fetch data to the processor fast enough, a bandwidth bottleneck that has increasingly constricted over the past few decades.

The bandwidth gap is just part of the problem. The other part is related to latency, the time it takes the memory to produce a chunk of data requested by the processor. This delay can vary from tens to hundreds of nanoseconds. That may not seem like much, but in a mere 50 nanoseconds a 3.8-GHz processor can go through 190 clock cycles. “You don’t want the processor waiting for that long,” says Brian T. Davis, a professor of electrical and computer engineering at the Michigan Technological University, in Houghton. The latency problem prompted chip makers years ago to embed some DRAM caches directly onto CPU chips, as well as to concoct some processing tricks to keep the wait for data as short as possible. Despite these improvements, modern CPUs can spend more than half their time—and often much more, Davis notes—just waiting for data to come from memory.

the PS3, the Cell and the RSX are connected by a Rambus interface technology, which, sure enough, Rambus has given a catchy name—FlexIO. The total bus width between the two chips is 7 bytes: 4 bytes to move data from the Cell to the RSX, 3 to move data in the opposite direction. This setting gives a bandwidth of 20 GB/s outbound from the Cell and 15 GB/s inbound—almost 10 times as fast as PCI Express, an interface standard popular in today’s PCs. Thanks to FlexIO, the Cell processor can fling an incredibly large number of triangles to the RSX, and the result is more details and more objects in 3-D games.

Future Gaming consoles will continue to demand ever-faster buses, but how much bandwidth is enough will vary from system to system. For instance, one of PlayStation 3’s main competitors, Microsoft’s Xbox 360, released last year, relies on a CPU-GPU bus with a bandwidth of 21.6 GB/s, half in each direction. It’s a proprietary interface developed by IBM that runs at 5.4 GHz and relies on differential signaling to maintain signal integrity. It may not be as fast as PS3’s, but Xbox 360 owners don’t seem disappointed.

In fact, just because PS3 has more powerful data pipes, that doesn’t mean its games will easily get the full benefit from them. As in any other computer system, software, not just hardware, matters. Game developers will have to design their code carefully to make sure that the Cell is getting the types of workloads for which it works best, and that data are streaming smoothly between processor, memory, and GPU.

so as you can see , it not just about the Cell, or the xenon or their respective GPU's its about the system as a whole and your article failed to even go into that part of his take on each system. Now like i said it's not about his take it's about its Relevance today.


http://www.spectrum.ieee.org/images/...ges/gamef1.pdf

 

I agree with you 100% here. It is all about the whole system which is exactly why they are so close in overall ability.

The cell helps out the RSX alot as the visuals get more fidelity. I didnt mean that the Cell cant do anything else if it's doing graphical calculations, but it will have less power available to do what normal CPU operations are.

Whereas the 360 has a more traditional setup, it has got a vastly more powerful GPU. So you will never need to have a CPU do graphical calculations to get the same game from 360. In fact the Xenos GPU will take awhile to learn to programme right as it is vastly new and different than normal GPU's. ( Or certainly was in 2006 ). Not only is it's architecture completely redefining GPU's but ATI also had a hand in designing the overall memory of the 360.

Once you learn that the Xbox 360 GPU also acts as the system’s memory controller, much like the Northbridge in an Intel PC, the picture becomes a bit clearer. ATI has been making and designing chipsets for a good while now that use GDDR3 RAM. Add to this that Joe Macri (go cart racing fiend extraordinaire), who was a pivotal factor in defining the GDDR3 RAM specification at JEDEC and is also a big fish at ATI, and it only makes sense that ATI could possibly put together one of the best GDDR3 memory controllers in the world. So while it might seem odd that the Xbox 360 Power PC processor is using “graphics” memory for its main system memory and a “GPU” as the “northbridge,” once you see the relationship between the three and the technology being used it is quite simple. Therefore, we have the 700MHz GDDR3 RAM acting as both system RAM and as GPU RAM, connected to the GPU via a traditional GDDR3 bus interface that can channel an amazing 25 Gigabytes per second of data.

Smart 3D Memory is the biggest standout and innovative feature I saw inside the entire Xbox 360. To give you an idea of what it would look like first hand, think of any normal GPU you might see, something much like this Mobility Radeon X700 chipset. The X700 is pretty much what any modern GPU looks like. Now think of that same chipset as having a single piece of DRAM sitting off to one side, much like you can see in this ATI slide below, but with one less piece of RAM (and no arrows).

 

 

 

 

Keep in mind, ATI is not a stranger to adding memory to a chipset, but remember that this is “smart” memory.

 

The Xbox 360 Smart 3D Memory is a relatively small piece of DRAM sitting off to the side of the GPU but yet on the same substrate. The Smart 3D Memory weighs in at only 10MB. Now the first thing that you might think is, “Well what the hell good is 10MB in the world of 512MB frame buffers?” And that would be a good line of questioning. The “small” 10MB of Smart 3D memory that is currently being built by NEC will have an effective bus rate between it and the GPU of 2GHz. This is of course over 3X faster that what we see on the high end of RAM today.

 

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is “antialiasing for free” done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read about it here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. Therefore, not only will all of your games on Xbox 360 be in High Definition, but they also will have 4XAA applied.

 

The Smart 3D Memory can also compute Z depths, occlusion culling, and also does a very good job at figuring stencil shadows. Stencil shadows are used in games that will use the DOOM 3 engine such as Quake 4 and Prey.

 

Now remember that all of these operations are taking place on the Smart 3D Memory so they will have very little impact on the workload GPU itself. You may now be asking yourself what exactly the GPU will be doing.

First off, we reported on page 2 in our chart that the capable “Shader Performance” of the Xbox 360 GPU is 48 billion shader operations per second. While that is what Microsoft told us, Mr. Feldstein of ATI let us know that the Xbox 360 GPU is capable of doing two of those shaders per cycle. So yes, if programmed correctly, the Xbox 360 GPU is capable of 96 billion shader operations per second. Compare this with ATI’s current PC add-in flagship card and the Xbox 360 more than doubles its abilities.

 

Now that we see a tremendous amount of raw shader horsepower, we have to take into account that there are two different kinds of shader operations that can be programmed by content developers. These are the vertex and pixels shaders. These are really just what they sound like. Vertex shader operations are used to move vertices, which shape polygons, which make up most objects you, see in your game, like characters, buildings, or vehicles. Pixel shader operations dictate what groups of pixels do like bodies of water or clouds in the sky, or maybe a layer of smoke or haze.

 

In today’s world of shader hardware, we have traditionally had one hardware unit to do pixel shaders and one hardware unit to do vertex shaders. The Xbox 360 GPU breaks new ground in that the hardware shader units are intelligent as well. Very simply, the Xbox 360 hardware shader units can do either vertex or pixel shaders quickly and efficiently. Just think of the Xbox 360 shaders as being analogous to SIMD shader units (Single Instructions carried out on Multiple Data).

 

The advantage of this would not be a big deal if every game were split 50/50 in terms of pixel and vertex shaders. That is not the case though. While most games are vertex shader bottlenecked, some others are pixel shader bottlenecked this is impossible to get bottlenecked on the Xbox 360.

So although they both have different methods, you end up with a very close overall capability. And this is in a large way down to ATI's influence on memory and brand new tech in the GPU.

 



AlkamistStar said:
@joeorc

Ummm...what?! Yea i was better off reading the 11 page article than your comments.

At the end of the day, despite what developers do, and how software changes...the HARDWARE remains the same. selnor tried to make that point at the outset of this post, but seems your just in this for arguments sake.

and I as other 's who had pointed out that the "article" that selnor is putting his faith in as Quote: fact's has quite a bit of flaw's in that artical's take on the system's. which it seem's to me unless you agree with everything that  person wrote on his "blog" compared the PS3 to the xbox360 mean's his take on it is 100% fact's.

 

well i got new's for you there are people that disagree with the man's take based on information we have had since (2006)

get it this artical is 2006. the hardware is a guideline, its the software engine's and it's software stack that drive the hardware the hardware argument Selnor is trying to make is a vailed attempt at tring to make it out like this person's "blog" take on the ps3 is viable today when a majority of which he has wrong.

for instance he claims the GPU of the ps3 is just a 7800. its based on the n47 but the picture he had for his source was far from indepth what other things the RSX has inside its design. like i pointed out @SELNOR the PS3's ppe has to do all the direction for the SPE's. which is not true .

like i said alot has changed since 2006. what % of the resources that IBM was able to attain in 2005 is not the same as they are able to attain with the same hardware.

IBM made both Processors and they have stated which processor through test's can attain the near to max of the flops the processor's could attain,the Selnor's post article contain's none of these type's of test's .AS a matter of fact where is this OP's blog "update to his own article" shas ha gathered any more experience with said hardware from the xbox360 or the PS3. it seems many people here are very Quick to take Selnor's article at face value as "fact's" instead of looking at it from many perspective's instead of just one person's take on it. .

 



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.

selnor said:
joeorc said:
selnor said:
joeorc said:
selnor said:
joeorc said:

@selnor
"But at the end of the day it still comes back to spreading 512kb L2 Cache between 6 SPE's and 1PPE. As well as having to deal with each instruction in order rather than like a PC CPU which can deal with any code it needs to."

i read this article back in 2006.

your statement only proves how outdated this article is..for one the SPE's do not just rely on the PPE for instruction, because the SPE's have their own instruction's seperate for the PPE. the SPE's can do direct DMA to and from other SPE's with out the need of the PPE.because the SPE's have their own local store.

here is some fact's for people:

users.ece.gatech.edu/~lanterma/mpg/ece4893_xbox360_vs_ps3.pdf

real world tests....

http://www.ibm.com/developerworks/power/library/pa-cellperf/

@selnor

to me this shows his take has many faults on what you describe as.."facts" about the ps3 though i respect his opinion i do no agree with it. like other's may not agree with mine.


IT doesnt matter when an article was written, the componenets of a machine dont change. That IBM article you posted is what the tech specialist used and has as a source on the final page of his disection. Yes they are tests. But tests are done in a controlled environment. Notice how on IBM's very own graphs they show they can actual only get 75% of the theoretical power when using all SPE's at once?

That IBM article completely backs up this tech persons article. Theoretical PS3 can do 218 GFLOPS with the cell. Actual game environment will be closer to 70 GFLOPS in the best games of this gen. For some reason Cell loses alot of peak ower when all SPE's are being used. 360's Xenon will likely see around 60 GFLOPS peak for actual games. But again Cell can help out The weak RSX where it needs to. And Xenos on 360 has the ability to help out Xenon CPU. Because the Xenos is very much more advanced than RSX. It's catch 20/20.

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

The only part that can change in from the article is the OS usage. How much ram and how much processing time. But thats it. The rest will never change, but developers will learn ways like they did last gen to overcome any hiccups. And g=create great games like Killzone 2, or Forza 3. It's the nature of the beast.

The main points I got from the article is that in no way Is PS3 or 360 supercomputers. Also PS2 had a better CPU than Xbox 1, yet Xbox 1 provided greater graphics. How advanced the 360 GPU actually is. And how the gap this gen between the 2 are closer than the gap between PS2 and Xbox 1.

You have to remember that the hard facts about machines dont change. The specs dont change, how out of order CPUs work dont change, how memory works doesn't change and the article even talks about how developers can use the SPE's to their advantage or not.

Lets not discredit factual information because it hurts our fanboyism. I have no problem in admitting I was wrong that PS3 Cell wasn't that powerful. But like the article says using SPE's for things like cloth motion etc means the cell can display more on screen than the xenon.

But as he points out on a whole the machines are so much closer than last gen, becasue RSX is helped by Cell and Xenon is helped by Xenos. It's like M$ really went for awesome graphics chip and Sony went for awesome CPU. And in the thick of it both machines can use the stronger chip to help out their weaker chip to overcome an shortcomings. It's funny really. But thats life.

It's also worth noting that multithreaded engines like UE3 are nowhere near fully optimized multithreading code. this can take even another 3 or 4 years to perfect.

once again i respect this guy's Opinion based on what he has read about the Cell but i have to disagree with his take on how that technology is used , based on his experience of the Cell. for instance you say that the IBM test's are in controlled enviroment...but on the same token back in 2005 the API's were not as well developed for the Cell as they are now, the very fact that even back in 2005 they were getting 30 fps TRE on the same CPU that is in the PS3 and this was shown working at the trade show points to the fact that his claim's are based on his OPINION on what data he had on hand. which he did not have all the data of each system. he just had what he had, and he made his Opinion known on his blog based on that data.

 

I do not fault him for it other than its way outdated.

I have no problem in you posting what you did because it give's you and idea what this person's experience with what he has about both platform's,

Me i do not tend to only look at just one article based on 2005, or 2006 to say this is fact's if the person has not done any test's themselves., does it mean he's 100% wrong or right depends on who you ask. me i see some of the thing's he did get right, on the ps3 but his take on how to impliment the development process is wrong, which would give much more results than what he has stated. Like i said i am not knocking his take due to what info he only had on hand at the time.

i am not knocking it being posted i am just knocking on it's relevance today.

EXAMPLE:

IBM has done test's on both the xbox360 processor and the Cell processor both are very powerful processor's but even IBM which created both processor's has stated the Cell canreach close to its maximum Flops compared to the xbox360's. this does not mean much anyway because it's upto the developer and his experience with the hardware.

did you ever think the Cell processor can do unified shader''s but it can.



I understand what your saying, but he even describes pretty much how KZ2 was made back in 2006. The cell can do Unified shaders but when it's doing this it can only work on less normal CPU things. RSX has 8 US and Xenos has 48, so the cell has to help RSX to get close to Xenos Unified shaders. But it's all catch 20/20. Because Xenos GPU in 360 is capable of performing CPU intensive tasks. Like AI, or physics.

So on one hand you have Cell helping RSX with Grpahics calculations. And on the other you have Xenos helping Xenon with CPU calcultaions.

At your example. That is 100% false.

Taken directly from IBM's full test of Cell running 8 SPU's

Table 4. Performance of parallelized Linpack on eight SPUs

Matrix size Cycles # of Insts. CPI Single Issue Dual Issue Channel Stalls Other Stalls # of Used Regs SPEsim Mea- sured Model accuracy Effi- ciency
1024x1024 27.6M 2.92M 0.95 27.9% 32.6% 26.9% 12.6% 126 83.12 73.04 87.87% 35.7%
4096x4096 918.0M 1.51G 0.61 29.0% 56.7% 10.8% 3.4% 126 160 155.5 97.2% 75.9%

Notice the model accuracy is 97.2% and the efficiency is 75.9% of the theorectical Peak GFLOPS performance. Doing the math brings down the GFLOPS in a contolled environment test to 150 odd GFLOPS. Take away 1 SPE which is not used for games and another which is dorment for OS and we are closer to 120 GFLOPS. Again 120 GFLOPS is in a controlled environment and not in an unstable game code environment.

This is directly from the link you posted in this thread. These figures will never change, unless Sony changes the CPU in the PS3. And that doesn't happen to consoles. Dont get me wrong both consoles are powerful but not nearly as powerful as PR BS portrays. It's like TRE. You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell.

Likewise the same is said for 360. There is just a different set of boundaries for that machine.

an no my example is not false..reason being IBM created the chips they have done real world test's. does it mean the performance of engines on the xbox360 cannot attain higher performance. no it does not but on the same token that same thing can be said about the PS3..its the software engine, the technology it is just a rough guide what could be able to attain. will they reach their maximum's ..prob not . but it does not take away from the fact that IBM stated what they did due to the test's they have done on both systems processor's.

once again the API's was not as well developed in 2005, because the Cell was just unvailed in 2004 that's like saying the engine will stay the same for any processor..yea unless you tweak the engine which we all know EPIC, or other developer's do not tweak their engine's..come on man your reachin.

example:

you just stated this:

@selnor

"You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell."

yes it can..and that's where you are going about it all wrong.

"staude" pointed this very same thing out to you in this very same thread. what you relate to PC programming is not the way you look at these type's of local store embedded cpu core's and how they can be developed on. "you can" but your result's will be reduced . this is more about memory management than about trying to rely on just a large pool of ram to do everything from. it's a much more indepth precise way of development with this type of design. because there is more seperate core's to manage not just on memory, but also what each core will need to do in any clock cycle.

but that also has problems of it's own:

ever heard of

"differential signaling"

example :

The realism of today’s games, though, demands far more number crunching than the CPU alone can deliver. That’s where the graphics processing unit, or GPU, comes in. Every time an object has to be rendered on screen, the CPU sends information about that object to the GPU, which then performs more specialized types of calculations to create the volume, motion, and lighting of the object.

But despite churning through billions of floating-point math operations per second, or flops, today’s gaming systems and PCs still can’t deliver the realism that game developers seek. CPUs, memories, and GPUs just aren’t powerful enough—or can’t exchange data fast enough—to handle the complexity and richness of the games designers want to create. In other words, hardware constraints force them to reduce the number of objects in scenes, their motion and texture, and the quality of special effects.

The need for speed becomes even more critical in the PS3, whose Cell processor is actually nine processors on a single silicon die. In the Cell, one processor, or core, divides up work for the other eight cores, which were designed to stream through computation-intensive workloads like video processing, content decryption, and physics calculations. [For more on the Cell chip, see “Multimedia Monster,” IEEE Spectrum, January 2006.] Using all its cores, the 3.2-gigahertz Cell processor can deliver a whopping 192 billion single-precision flops. Without a speedy connection to the PS3’s memory, the Cell starves for data.

To speed up data transfers between the Cell processor and its memory chips, the PS3’s designers adopted a novel memory system architecture that, Rambus says, addresses some of the limitations of current DRAMs [see “How the PlayStation 3 Shuttles Bits” To understand how these limitations came about, consider first the co-evolution of microprocessors and memory.

Moore’s Law tells us that transistor densities on chips are doubling every 18 months or so. This evolution has been accompanied by a doubling, on a similar time scale, in the clock rates of processor chips, basically because smaller transistors can toggle on and off faster. But memory clock rates, which serve as an indicator of memory data-transfer rates, are doubling much more slowly—about every 10 years. The result is that memory can’t fetch data to the processor fast enough, a bandwidth bottleneck that has increasingly constricted over the past few decades.

The bandwidth gap is just part of the problem. The other part is related to latency, the time it takes the memory to produce a chunk of data requested by the processor. This delay can vary from tens to hundreds of nanoseconds. That may not seem like much, but in a mere 50 nanoseconds a 3.8-GHz processor can go through 190 clock cycles. “You don’t want the processor waiting for that long,” says Brian T. Davis, a professor of electrical and computer engineering at the Michigan Technological University, in Houghton. The latency problem prompted chip makers years ago to embed some DRAM caches directly onto CPU chips, as well as to concoct some processing tricks to keep the wait for data as short as possible. Despite these improvements, modern CPUs can spend more than half their time—and often much more, Davis notes—just waiting for data to come from memory.

the PS3, the Cell and the RSX are connected by a Rambus interface technology, which, sure enough, Rambus has given a catchy name—FlexIO. The total bus width between the two chips is 7 bytes: 4 bytes to move data from the Cell to the RSX, 3 to move data in the opposite direction. This setting gives a bandwidth of 20 GB/s outbound from the Cell and 15 GB/s inbound—almost 10 times as fast as PCI Express, an interface standard popular in today’s PCs. Thanks to FlexIO, the Cell processor can fling an incredibly large number of triangles to the RSX, and the result is more details and more objects in 3-D games.

Future Gaming consoles will continue to demand ever-faster buses, but how much bandwidth is enough will vary from system to system. For instance, one of PlayStation 3’s main competitors, Microsoft’s Xbox 360, released last year, relies on a CPU-GPU bus with a bandwidth of 21.6 GB/s, half in each direction. It’s a proprietary interface developed by IBM that runs at 5.4 GHz and relies on differential signaling to maintain signal integrity. It may not be as fast as PS3’s, but Xbox 360 owners don’t seem disappointed.

In fact, just because PS3 has more powerful data pipes, that doesn’t mean its games will easily get the full benefit from them. As in any other computer system, software, not just hardware, matters. Game developers will have to design their code carefully to make sure that the Cell is getting the types of workloads for which it works best, and that data are streaming smoothly between processor, memory, and GPU.

so as you can see , it not just about the Cell, or the xenon or their respective GPU's its about the system as a whole and your article failed to even go into that part of his take on each system. Now like i said it's not about his take it's about its Relevance today.


http://www.spectrum.ieee.org/images/...ges/gamef1.pdf

 

I agree with you 100% here. It is all about the whole system which is exactly why they are so close in overall ability.

The cell helps out the RSX alot as the visuals get more fidelity. I didnt mean that the Cell cant do anything else if it's doing graphical calculations, but it will have less power available to do what normal CPU operations are.

Whereas the 360 has a more traditional setup, it has got a vastly more powerful GPU. So you will never need to have a CPU do graphical calculations to get the same game from 360. In fact the Xenos GPU will take awhile to learn to programme right as it is vastly new and different than normal GPU's. ( Or certainly was in 2006 ). Not only is it's architecture completely redefining GPU's but ATI also had a hand in designing the overall memory of the 360.

Once you learn that the Xbox 360 GPU also acts as the system’s memory controller, much like the Northbridge in an Intel PC, the picture becomes a bit clearer. ATI has been making and designing chipsets for a good while now that use GDDR3 RAM. Add to this that Joe Macri (go cart racing fiend extraordinaire), who was a pivotal factor in defining the GDDR3 RAM specification at JEDEC and is also a big fish at ATI, and it only makes sense that ATI could possibly put together one of the best GDDR3 memory controllers in the world. So while it might seem odd that the Xbox 360 Power PC processor is using “graphics” memory for its main system memory and a “GPU” as the “northbridge,” once you see the relationship between the three and the technology being used it is quite simple. Therefore, we have the 700MHz GDDR3 RAM acting as both system RAM and as GPU RAM, connected to the GPU via a traditional GDDR3 bus interface that can channel an amazing 25 Gigabytes per second of data.

Smart 3D Memory is the biggest standout and innovative feature I saw inside the entire Xbox 360. To give you an idea of what it would look like first hand, think of any normal GPU you might see, something much like this Mobility Radeon X700 chipset. The X700 is pretty much what any modern GPU looks like. Now think of that same chipset as having a single piece of DRAM sitting off to one side, much like you can see in this ATI slide below, but with one less piece of RAM (and no arrows).

 

 

 

 

Keep in mind, ATI is not a stranger to adding memory to a chipset, but remember that this is “smart” memory.

 

The Xbox 360 Smart 3D Memory is a relatively small piece of DRAM sitting off to the side of the GPU but yet on the same substrate. The Smart 3D Memory weighs in at only 10MB. Now the first thing that you might think is, “Well what the hell good is 10MB in the world of 512MB frame buffers?” And that would be a good line of questioning. The “small” 10MB of Smart 3D memory that is currently being built by NEC will have an effective bus rate between it and the GPU of 2GHz. This is of course over 3X faster that what we see on the high end of RAM today.

 

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is “antialiasing for free” done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read about it here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. Therefore, not only will all of your games on Xbox 360 be in High Definition, but they also will have 4XAA applied.

 

The Smart 3D Memory can also compute Z depths, occlusion culling, and also does a very good job at figuring stencil shadows. Stencil shadows are used in games that will use the DOOM 3 engine such as Quake 4 and Prey.

 

Now remember that all of these operations are taking place on the Smart 3D Memory so they will have very little impact on the workload GPU itself. You may now be asking yourself what exactly the GPU will be doing.

First off, we reported on page 2 in our chart that the capable “Shader Performance” of the Xbox 360 GPU is 48 billion shader operations per second. While that is what Microsoft told us, Mr. Feldstein of ATI let us know that the Xbox 360 GPU is capable of doing two of those shaders per cycle. So yes, if programmed correctly, the Xbox 360 GPU is capable of 96 billion shader operations per second. Compare this with ATI’s current PC add-in flagship card and the Xbox 360 more than doubles its abilities.

 

Now that we see a tremendous amount of raw shader horsepower, we have to take into account that there are two different kinds of shader operations that can be programmed by content developers. These are the vertex and pixels shaders. These are really just what they sound like. Vertex shader operations are used to move vertices, which shape polygons, which make up most objects you, see in your game, like characters, buildings, or vehicles. Pixel shader operations dictate what groups of pixels do like bodies of water or clouds in the sky, or maybe a layer of smoke or haze.

 

In today’s world of shader hardware, we have traditionally had one hardware unit to do pixel shaders and one hardware unit to do vertex shaders. The Xbox 360 GPU breaks new ground in that the hardware shader units are intelligent as well. Very simply, the Xbox 360 hardware shader units can do either vertex or pixel shaders quickly and efficiently. Just think of the Xbox 360 shaders as being analogous to SIMD shader units (Single Instructions carried out on Multiple Data).

 

The advantage of this would not be a big deal if every game were split 50/50 in terms of pixel and vertex shaders. That is not the case though. While most games are vertex shader bottlenecked, some others are pixel shader bottlenecked this is impossible to get bottlenecked on the Xbox 360.

So although they both have different methods, you end up with a very close overall capability. And this is in a large way down to ATI's influence on memory and brand new tech in the GPU.

 

@selnor

you stated this:

"it has got a vastly more powerful GPU

no it does not: you are trying to  show with this article which has quite a bit of flaws in his take because he did not have as much info about the hardware than he had. there's quite a bit he is missing. for instance the GPU in the xbox360 is not and i mean not vastly more powerful for the simple fact the ps3's GPU is the

cell+the RSX=THE ps3's GPU... other's have been trying to take away the fact that you want to compare GPU to just the GPU and you cannot because the PS3's design is more like a "SLI" design than just the CPU seperate from the GPU its not like that its more like a cpu/gpu+a gpu. which this guy's blog is trying to implyjust the GPU to GPU. he failed to mention that the Cell can perform "unified shader's" why...he did not not KNOW The Cell could do it. as a matter of fact he just glossed over the SPE's capabilities .

like i said i cannot fault him on his basis back then because that's all the info he had. think about this for a sec. very few developer's had development kit's for the cell processor let alone had much experience development on the Cell platform, now add in the fact the sdk tool chain was not very well developed, and add in the fact the API's were not as well developed. yet he states in his article he talked to other developer's about development on the PS3. THE POINT IS THERE HAS NOT BEEN enough time on development of engine's for the Cell at that time when the article was written.Hell IBM was working to update the SDK for the Cell because they aknowledged it was not well developed enough yet.

your right The hardware does not change, but what you can unlock within the hardware can. So what one day he may say that's how it works may not be the right way, w

 



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.

"Published November 13, 2006 at 12:38 AM EST"

No thanks.



joeorc said:
selnor said:
joeorc said:
selnor said:
joeorc said:
selnor said:
joeorc said:

@selnor
"But at the end of the day it still comes back to spreading 512kb L2 Cache between 6 SPE's and 1PPE. As well as having to deal with each instruction in order rather than like a PC CPU which can deal with any code it needs to."

i read this article back in 2006.

your statement only proves how outdated this article is..for one the SPE's do not just rely on the PPE for instruction, because the SPE's have their own instruction's seperate for the PPE. the SPE's can do direct DMA to and from other SPE's with out the need of the PPE.because the SPE's have their own local store.

here is some fact's for people:

users.ece.gatech.edu/~lanterma/mpg/ece4893_xbox360_vs_ps3.pdf

real world tests....

http://www.ibm.com/developerworks/power/library/pa-cellperf/

@selnor

to me this shows his take has many faults on what you describe as.."facts" about the ps3 though i respect his opinion i do no agree with it. like other's may not agree with mine.


IT doesnt matter when an article was written, the componenets of a machine dont change. That IBM article you posted is what the tech specialist used and has as a source on the final page of his disection. Yes they are tests. But tests are done in a controlled environment. Notice how on IBM's very own graphs they show they can actual only get 75% of the theoretical power when using all SPE's at once?

That IBM article completely backs up this tech persons article. Theoretical PS3 can do 218 GFLOPS with the cell. Actual game environment will be closer to 70 GFLOPS in the best games of this gen. For some reason Cell loses alot of peak ower when all SPE's are being used. 360's Xenon will likely see around 60 GFLOPS peak for actual games. But again Cell can help out The weak RSX where it needs to. And Xenos on 360 has the ability to help out Xenon CPU. Because the Xenos is very much more advanced than RSX. It's catch 20/20.

The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.

The only part that can change in from the article is the OS usage. How much ram and how much processing time. But thats it. The rest will never change, but developers will learn ways like they did last gen to overcome any hiccups. And g=create great games like Killzone 2, or Forza 3. It's the nature of the beast.

The main points I got from the article is that in no way Is PS3 or 360 supercomputers. Also PS2 had a better CPU than Xbox 1, yet Xbox 1 provided greater graphics. How advanced the 360 GPU actually is. And how the gap this gen between the 2 are closer than the gap between PS2 and Xbox 1.

You have to remember that the hard facts about machines dont change. The specs dont change, how out of order CPUs work dont change, how memory works doesn't change and the article even talks about how developers can use the SPE's to their advantage or not.

Lets not discredit factual information because it hurts our fanboyism. I have no problem in admitting I was wrong that PS3 Cell wasn't that powerful. But like the article says using SPE's for things like cloth motion etc means the cell can display more on screen than the xenon.

But as he points out on a whole the machines are so much closer than last gen, becasue RSX is helped by Cell and Xenon is helped by Xenos. It's like M$ really went for awesome graphics chip and Sony went for awesome CPU. And in the thick of it both machines can use the stronger chip to help out their weaker chip to overcome an shortcomings. It's funny really. But thats life.

It's also worth noting that multithreaded engines like UE3 are nowhere near fully optimized multithreading code. this can take even another 3 or 4 years to perfect.

once again i respect this guy's Opinion based on what he has read about the Cell but i have to disagree with his take on how that technology is used , based on his experience of the Cell. for instance you say that the IBM test's are in controlled enviroment...but on the same token back in 2005 the API's were not as well developed for the Cell as they are now, the very fact that even back in 2005 they were getting 30 fps TRE on the same CPU that is in the PS3 and this was shown working at the trade show points to the fact that his claim's are based on his OPINION on what data he had on hand. which he did not have all the data of each system. he just had what he had, and he made his Opinion known on his blog based on that data.

 

I do not fault him for it other than its way outdated.

I have no problem in you posting what you did because it give's you and idea what this person's experience with what he has about both platform's,

Me i do not tend to only look at just one article based on 2005, or 2006 to say this is fact's if the person has not done any test's themselves., does it mean he's 100% wrong or right depends on who you ask. me i see some of the thing's he did get right, on the ps3 but his take on how to impliment the development process is wrong, which would give much more results than what he has stated. Like i said i am not knocking his take due to what info he only had on hand at the time.

i am not knocking it being posted i am just knocking on it's relevance today.

EXAMPLE:

IBM has done test's on both the xbox360 processor and the Cell processor both are very powerful processor's but even IBM which created both processor's has stated the Cell canreach close to its maximum Flops compared to the xbox360's. this does not mean much anyway because it's upto the developer and his experience with the hardware.

did you ever think the Cell processor can do unified shader''s but it can.



I understand what your saying, but he even describes pretty much how KZ2 was made back in 2006. The cell can do Unified shaders but when it's doing this it can only work on less normal CPU things. RSX has 8 US and Xenos has 48, so the cell has to help RSX to get close to Xenos Unified shaders. But it's all catch 20/20. Because Xenos GPU in 360 is capable of performing CPU intensive tasks. Like AI, or physics.

So on one hand you have Cell helping RSX with Grpahics calculations. And on the other you have Xenos helping Xenon with CPU calcultaions.

At your example. That is 100% false.

Taken directly from IBM's full test of Cell running 8 SPU's

Table 4. Performance of parallelized Linpack on eight SPUs

Matrix size Cycles # of Insts. CPI Single Issue Dual Issue Channel Stalls Other Stalls # of Used Regs SPEsim Mea- sured Model accuracy Effi- ciency
1024x1024 27.6M 2.92M 0.95 27.9% 32.6% 26.9% 12.6% 126 83.12 73.04 87.87% 35.7%
4096x4096 918.0M 1.51G 0.61 29.0% 56.7% 10.8% 3.4% 126 160 155.5 97.2% 75.9%

Notice the model accuracy is 97.2% and the efficiency is 75.9% of the theorectical Peak GFLOPS performance. Doing the math brings down the GFLOPS in a contolled environment test to 150 odd GFLOPS. Take away 1 SPE which is not used for games and another which is dorment for OS and we are closer to 120 GFLOPS. Again 120 GFLOPS is in a controlled environment and not in an unstable game code environment.

This is directly from the link you posted in this thread. These figures will never change, unless Sony changes the CPU in the PS3. And that doesn't happen to consoles. Dont get me wrong both consoles are powerful but not nearly as powerful as PR BS portrays. It's like TRE. You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell.

Likewise the same is said for 360. There is just a different set of boundaries for that machine.

an no my example is not false..reason being IBM created the chips they have done real world test's. does it mean the performance of engines on the xbox360 cannot attain higher performance. no it does not but on the same token that same thing can be said about the PS3..its the software engine, the technology it is just a rough guide what could be able to attain. will they reach their maximum's ..prob not . but it does not take away from the fact that IBM stated what they did due to the test's they have done on both systems processor's.

once again the API's was not as well developed in 2005, because the Cell was just unvailed in 2004 that's like saying the engine will stay the same for any processor..yea unless you tweak the engine which we all know EPIC, or other developer's do not tweak their engine's..come on man your reachin.

example:

you just stated this:

@selnor

"You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell."

yes it can..and that's where you are going about it all wrong.

"staude" pointed this very same thing out to you in this very same thread. what you relate to PC programming is not the way you look at these type's of local store embedded cpu core's and how they can be developed on. "you can" but your result's will be reduced . this is more about memory management than about trying to rely on just a large pool of ram to do everything from. it's a much more indepth precise way of development with this type of design. because there is more seperate core's to manage not just on memory, but also what each core will need to do in any clock cycle.

but that also has problems of it's own:

ever heard of

"differential signaling"

example :

The realism of today’s games, though, demands far more number crunching than the CPU alone can deliver. That’s where the graphics processing unit, or GPU, comes in. Every time an object has to be rendered on screen, the CPU sends information about that object to the GPU, which then performs more specialized types of calculations to create the volume, motion, and lighting of the object.

But despite churning through billions of floating-point math operations per second, or flops, today’s gaming systems and PCs still can’t deliver the realism that game developers seek. CPUs, memories, and GPUs just aren’t powerful enough—or can’t exchange data fast enough—to handle the complexity and richness of the games designers want to create. In other words, hardware constraints force them to reduce the number of objects in scenes, their motion and texture, and the quality of special effects.

The need for speed becomes even more critical in the PS3, whose Cell processor is actually nine processors on a single silicon die. In the Cell, one processor, or core, divides up work for the other eight cores, which were designed to stream through computation-intensive workloads like video processing, content decryption, and physics calculations. [For more on the Cell chip, see “Multimedia Monster,” IEEE Spectrum, January 2006.] Using all its cores, the 3.2-gigahertz Cell processor can deliver a whopping 192 billion single-precision flops. Without a speedy connection to the PS3’s memory, the Cell starves for data.

To speed up data transfers between the Cell processor and its memory chips, the PS3’s designers adopted a novel memory system architecture that, Rambus says, addresses some of the limitations of current DRAMs [see “How the PlayStation 3 Shuttles Bits” To understand how these limitations came about, consider first the co-evolution of microprocessors and memory.

Moore’s Law tells us that transistor densities on chips are doubling every 18 months or so. This evolution has been accompanied by a doubling, on a similar time scale, in the clock rates of processor chips, basically because smaller transistors can toggle on and off faster. But memory clock rates, which serve as an indicator of memory data-transfer rates, are doubling much more slowly—about every 10 years. The result is that memory can’t fetch data to the processor fast enough, a bandwidth bottleneck that has increasingly constricted over the past few decades.

The bandwidth gap is just part of the problem. The other part is related to latency, the time it takes the memory to produce a chunk of data requested by the processor. This delay can vary from tens to hundreds of nanoseconds. That may not seem like much, but in a mere 50 nanoseconds a 3.8-GHz processor can go through 190 clock cycles. “You don’t want the processor waiting for that long,” says Brian T. Davis, a professor of electrical and computer engineering at the Michigan Technological University, in Houghton. The latency problem prompted chip makers years ago to embed some DRAM caches directly onto CPU chips, as well as to concoct some processing tricks to keep the wait for data as short as possible. Despite these improvements, modern CPUs can spend more than half their time—and often much more, Davis notes—just waiting for data to come from memory.

the PS3, the Cell and the RSX are connected by a Rambus interface technology, which, sure enough, Rambus has given a catchy name—FlexIO. The total bus width between the two chips is 7 bytes: 4 bytes to move data from the Cell to the RSX, 3 to move data in the opposite direction. This setting gives a bandwidth of 20 GB/s outbound from the Cell and 15 GB/s inbound—almost 10 times as fast as PCI Express, an interface standard popular in today’s PCs. Thanks to FlexIO, the Cell processor can fling an incredibly large number of triangles to the RSX, and the result is more details and more objects in 3-D games.

Future Gaming consoles will continue to demand ever-faster buses, but how much bandwidth is enough will vary from system to system. For instance, one of PlayStation 3’s main competitors, Microsoft’s Xbox 360, released last year, relies on a CPU-GPU bus with a bandwidth of 21.6 GB/s, half in each direction. It’s a proprietary interface developed by IBM that runs at 5.4 GHz and relies on differential signaling to maintain signal integrity. It may not be as fast as PS3’s, but Xbox 360 owners don’t seem disappointed.

In fact, just because PS3 has more powerful data pipes, that doesn’t mean its games will easily get the full benefit from them. As in any other computer system, software, not just hardware, matters. Game developers will have to design their code carefully to make sure that the Cell is getting the types of workloads for which it works best, and that data are streaming smoothly between processor, memory, and GPU.

so as you can see , it not just about the Cell, or the xenon or their respective GPU's its about the system as a whole and your article failed to even go into that part of his take on each system. Now like i said it's not about his take it's about its Relevance today.


http://www.spectrum.ieee.org/images/...ges/gamef1.pdf

 

I agree with you 100% here. It is all about the whole system which is exactly why they are so close in overall ability.

The cell helps out the RSX alot as the visuals get more fidelity. I didnt mean that the Cell cant do anything else if it's doing graphical calculations, but it will have less power available to do what normal CPU operations are.

Whereas the 360 has a more traditional setup, it has got a vastly more powerful GPU. So you will never need to have a CPU do graphical calculations to get the same game from 360. In fact the Xenos GPU will take awhile to learn to programme right as it is vastly new and different than normal GPU's. ( Or certainly was in 2006 ). Not only is it's architecture completely redefining GPU's but ATI also had a hand in designing the overall memory of the 360.

Once you learn that the Xbox 360 GPU also acts as the system’s memory controller, much like the Northbridge in an Intel PC, the picture becomes a bit clearer. ATI has been making and designing chipsets for a good while now that use GDDR3 RAM. Add to this that Joe Macri (go cart racing fiend extraordinaire), who was a pivotal factor in defining the GDDR3 RAM specification at JEDEC and is also a big fish at ATI, and it only makes sense that ATI could possibly put together one of the best GDDR3 memory controllers in the world. So while it might seem odd that the Xbox 360 Power PC processor is using “graphics” memory for its main system memory and a “GPU” as the “northbridge,” once you see the relationship between the three and the technology being used it is quite simple. Therefore, we have the 700MHz GDDR3 RAM acting as both system RAM and as GPU RAM, connected to the GPU via a traditional GDDR3 bus interface that can channel an amazing 25 Gigabytes per second of data.

Smart 3D Memory is the biggest standout and innovative feature I saw inside the entire Xbox 360. To give you an idea of what it would look like first hand, think of any normal GPU you might see, something much like this Mobility Radeon X700 chipset. The X700 is pretty much what any modern GPU looks like. Now think of that same chipset as having a single piece of DRAM sitting off to one side, much like you can see in this ATI slide below, but with one less piece of RAM (and no arrows).

 

 

 

 

Keep in mind, ATI is not a stranger to adding memory to a chipset, but remember that this is “smart” memory.

 

The Xbox 360 Smart 3D Memory is a relatively small piece of DRAM sitting off to the side of the GPU but yet on the same substrate. The Smart 3D Memory weighs in at only 10MB. Now the first thing that you might think is, “Well what the hell good is 10MB in the world of 512MB frame buffers?” And that would be a good line of questioning. The “small” 10MB of Smart 3D memory that is currently being built by NEC will have an effective bus rate between it and the GPU of 2GHz. This is of course over 3X faster that what we see on the high end of RAM today.

 

Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is “antialiasing for free” done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read about it here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. Therefore, not only will all of your games on Xbox 360 be in High Definition, but they also will have 4XAA applied.

 

The Smart 3D Memory can also compute Z depths, occlusion culling, and also does a very good job at figuring stencil shadows. Stencil shadows are used in games that will use the DOOM 3 engine such as Quake 4 and Prey.

 

Now remember that all of these operations are taking place on the Smart 3D Memory so they will have very little impact on the workload GPU itself. You may now be asking yourself what exactly the GPU will be doing.

First off, we reported on page 2 in our chart that the capable “Shader Performance” of the Xbox 360 GPU is 48 billion shader operations per second. While that is what Microsoft told us, Mr. Feldstein of ATI let us know that the Xbox 360 GPU is capable of doing two of those shaders per cycle. So yes, if programmed correctly, the Xbox 360 GPU is capable of 96 billion shader operations per second. Compare this with ATI’s current PC add-in flagship card and the Xbox 360 more than doubles its abilities.

 

Now that we see a tremendous amount of raw shader horsepower, we have to take into account that there are two different kinds of shader operations that can be programmed by content developers. These are the vertex and pixels shaders. These are really just what they sound like. Vertex shader operations are used to move vertices, which shape polygons, which make up most objects you, see in your game, like characters, buildings, or vehicles. Pixel shader operations dictate what groups of pixels do like bodies of water or clouds in the sky, or maybe a layer of smoke or haze.

 

In today’s world of shader hardware, we have traditionally had one hardware unit to do pixel shaders and one hardware unit to do vertex shaders. The Xbox 360 GPU breaks new ground in that the hardware shader units are intelligent as well. Very simply, the Xbox 360 hardware shader units can do either vertex or pixel shaders quickly and efficiently. Just think of the Xbox 360 shaders as being analogous to SIMD shader units (Single Instructions carried out on Multiple Data).

 

The advantage of this would not be a big deal if every game were split 50/50 in terms of pixel and vertex shaders. That is not the case though. While most games are vertex shader bottlenecked, some others are pixel shader bottlenecked this is impossible to get bottlenecked on the Xbox 360.

So although they both have different methods, you end up with a very close overall capability. And this is in a large way down to ATI's influence on memory and brand new tech in the GPU.

 

@selnor

you stated this:

"it has got a vastly more powerful GPU

no it does not: you are trying to  show with this article which has quite a bit of flaws in his take because he did not have as much info about the hardware than he had. there's quite a bit he is missing. for instance the GPU in the xbox360 is not and i mean not vastly more powerful for the simple fact the ps3's GPU is the

cell+the RSX=THE ps3's GPU... other's have been trying to take away the fact that you want to compare GPU to just the GPU and you cannot because the PS3's design is more like a "SLI" design than just the CPU seperate from the GPU its not like that its more like a cpu/gpu+a gpu. which this guy's blog is trying to implyjust the GPU to GPU. he failed to mention that the Cell can perform "unified shader's" why...he did not not KNOW The Cell could do it. as a matter of fact he just glossed over the SPE's capabilities .

like i said i cannot fault him on his basis back then because that's all the info he had. think about this for a sec. very few developer's had development kit's for the cell processor let alone had much experience development on the Cell platform, now add in the fact the sdk tool chain was not very well developed, and add in the fact the API's were not as well developed. yet he states in his article he talked to other developer's about development on the PS3. THE POINT IS THERE HAS NOT BEEN enough time on development of engine's for the Cell at that time when the article was written.Hell IBM was working to update the SDK for the Cell because they aknowledged it was not well developed enough yet.

your right The hardware does not change, but what you can unlock within the hardware can. So what one day he may say that's how it works may not be the right way, w

 


You keep pointing back to Cell + RSX. I know all about that. My point I think is what you keep missing. The RSX on it's own against Xenos is alot less capable. Now we know the Cell on it's own is more powerful than Xenon, but when you take into account for games that 2 SPE's out of the 8 can NEVER be used becuase they are perminantly taken out of the equation via hardware and not software already your losing 50 Theoretical GFLOPs of what IBM's numbers are. Which brings the Xenon and Cell closer than they are originally on paper.

Now my point is this Cell + RSX work together. Straight away that means the Cell has to deal with graphical codes that Xenon never will. So Cell is using some of it's overall power to go towards graphics. Now that like I said takes away some power, so the rest is available for the normal CPU tasks like AI, Physics, controller input, collision detection etc etc. So in retrospect the cell is bring the graphics processing side up.

Now the 360 on the other hand has such a powerful GPU, for instance like I posted Xenos can do in realtime 96 billion operations a second. Also it has Unified Shaders without the 360 CPU helping it. Coupled together it has a seperate NEC sister die with it's own Ram at 2 ghz that deals with all AA and High Definition resolution as well as stencil shading, all without affecting the GPU at all. Now where it requires the PS3 to have cell help out the RSX just to reach the same level of power that the Xenos has the Xenos can do on it's own. Leaving the 360 CPU to be fully free to do tasks like AI, Physics, collision detection etc. Now heres the nest thing. The 360 GPU can have multiple threads running on it to, which the RSX does not have the ability to do, and this is to do with unified shaders as well.

So while some of cells power is taken up together with the RSX to do the same job 360 Xenos GPU does on it's own, then the rest of the tasks the CPU has to do becasue RSX cant is on less power than the full power of the cell ( becasue obviously some of the cell is being used ). Which will bring the Cell's CPU task power much more inline with what the 360 Xenon is capable of handling.

Do you see my point. Lets say RSX can handle 4 cans of beer maximum. Xenos can handle 7 with froth. Now Cell helps out and handles 3 more beers for it or with it. So we have 7 beers handled by PS3 with Cell plus RSX. But Cells overall power can handle 10 wines or 12 beers if it wanted. ( Beers represent GPU type tasks and wine represents CPU only tasks ). Now Xenon only handles 8 wines but no beers. Thing is now Cell is helping with 3 beers it can now only handle 8 wines, becasue it's already doing 3 beers. You see. Cell + RSX handles 7 beers together and whats left of Cell does the 8 wines. Wheras Xenos already handles 7 beers, so 360 CPU doesnt need to help. So it just handles all the CPU code ( wine ).

It's common knowledge that Cell has to help RSX alot especially when a game involves alot of changing between vertex, geometry and pixel shaders. Because otherwise RSX would have alot of unused shaders alot of the time. Wheras Xenos has 48 shaders that can do all 3 anytime.