selnor said:
joeorc said:
selnor said:
joeorc said:
selnor said:
joeorc said:
@selnor "But at the end of the day it still comes back to spreading 512kb L2 Cache between 6 SPE's and 1PPE. As well as having to deal with each instruction in order rather than like a PC CPU which can deal with any code it needs to."
i read this article back in 2006.
your statement only proves how outdated this article is..for one the SPE's do not just rely on the PPE for instruction, because the SPE's have their own instruction's seperate for the PPE. the SPE's can do direct DMA to and from other SPE's with out the need of the PPE.because the SPE's have their own local store.
here is some fact's for people:
users.ece.gatech.edu/~lanterma/mpg/ece4893_xbox360_vs_ps3.pdf
real world tests....
http://www.ibm.com/developerworks/power/library/pa-cellperf/
@selnor
to me this shows his take has many faults on what you describe as.."facts" about the ps3 though i respect his opinion i do no agree with it. like other's may not agree with mine.
|
IT doesnt matter when an article was written, the componenets of a machine dont change. That IBM article you posted is what the tech specialist used and has as a source on the final page of his disection. Yes they are tests. But tests are done in a controlled environment. Notice how on IBM's very own graphs they show they can actual only get 75% of the theoretical power when using all SPE's at once?
That IBM article completely backs up this tech persons article. Theoretical PS3 can do 218 GFLOPS with the cell. Actual game environment will be closer to 70 GFLOPS in the best games of this gen. For some reason Cell loses alot of peak ower when all SPE's are being used. 360's Xenon will likely see around 60 GFLOPS peak for actual games. But again Cell can help out The weak RSX where it needs to. And Xenos on 360 has the ability to help out Xenon CPU. Because the Xenos is very much more advanced than RSX. It's catch 20/20.
The article is very factual. Yes developers will always find ways around problems of hardware, and in terms of console hardware these 2 consoles are a big step up from the previous generation. But the actual CPU's inside 360 and PS3 aren't actual capable of more than Athlon 3200+ for instance. Becasue In Order Execution CPU's have a very limited way of being used. And adding multithreading makes that even harder. If cell and Xenon were Out of Order CPU's they would be considerably more powerful and faster at what they could do, but they wouls alos lkely cost you to sell your mum to buy the console.
The only part that can change in from the article is the OS usage. How much ram and how much processing time. But thats it. The rest will never change, but developers will learn ways like they did last gen to overcome any hiccups. And g=create great games like Killzone 2, or Forza 3. It's the nature of the beast.
The main points I got from the article is that in no way Is PS3 or 360 supercomputers. Also PS2 had a better CPU than Xbox 1, yet Xbox 1 provided greater graphics. How advanced the 360 GPU actually is. And how the gap this gen between the 2 are closer than the gap between PS2 and Xbox 1.
You have to remember that the hard facts about machines dont change. The specs dont change, how out of order CPUs work dont change, how memory works doesn't change and the article even talks about how developers can use the SPE's to their advantage or not.
Lets not discredit factual information because it hurts our fanboyism. I have no problem in admitting I was wrong that PS3 Cell wasn't that powerful. But like the article says using SPE's for things like cloth motion etc means the cell can display more on screen than the xenon.
But as he points out on a whole the machines are so much closer than last gen, becasue RSX is helped by Cell and Xenon is helped by Xenos. It's like M$ really went for awesome graphics chip and Sony went for awesome CPU. And in the thick of it both machines can use the stronger chip to help out their weaker chip to overcome an shortcomings. It's funny really. But thats life.
It's also worth noting that multithreaded engines like UE3 are nowhere near fully optimized multithreading code. this can take even another 3 or 4 years to perfect.
|
once again i respect this guy's Opinion based on what he has read about the Cell but i have to disagree with his take on how that technology is used , based on his experience of the Cell. for instance you say that the IBM test's are in controlled enviroment...but on the same token back in 2005 the API's were not as well developed for the Cell as they are now, the very fact that even back in 2005 they were getting 30 fps TRE on the same CPU that is in the PS3 and this was shown working at the trade show points to the fact that his claim's are based on his OPINION on what data he had on hand. which he did not have all the data of each system. he just had what he had, and he made his Opinion known on his blog based on that data.
I do not fault him for it other than its way outdated.
I have no problem in you posting what you did because it give's you and idea what this person's experience with what he has about both platform's,
Me i do not tend to only look at just one article based on 2005, or 2006 to say this is fact's if the person has not done any test's themselves., does it mean he's 100% wrong or right depends on who you ask. me i see some of the thing's he did get right, on the ps3 but his take on how to impliment the development process is wrong, which would give much more results than what he has stated. Like i said i am not knocking his take due to what info he only had on hand at the time.
i am not knocking it being posted i am just knocking on it's relevance today.
EXAMPLE:
IBM has done test's on both the xbox360 processor and the Cell processor both are very powerful processor's but even IBM which created both processor's has stated the Cell canreach close to its maximum Flops compared to the xbox360's. this does not mean much anyway because it's upto the developer and his experience with the hardware.
did you ever think the Cell processor can do unified shader''s but it can.
|
I understand what your saying, but he even describes pretty much how KZ2 was made back in 2006. The cell can do Unified shaders but when it's doing this it can only work on less normal CPU things. RSX has 8 US and Xenos has 48, so the cell has to help RSX to get close to Xenos Unified shaders. But it's all catch 20/20. Because Xenos GPU in 360 is capable of performing CPU intensive tasks. Like AI, or physics.
So on one hand you have Cell helping RSX with Grpahics calculations. And on the other you have Xenos helping Xenon with CPU calcultaions.
At your example. That is 100% false.
Taken directly from IBM's full test of Cell running 8 SPU's
Table 4. Performance of parallelized Linpack on eight SPUs
| Matrix size |
Cycles |
# of Insts. |
CPI |
Single Issue |
Dual Issue |
Channel Stalls |
Other Stalls |
# of Used Regs |
SPEsim |
Mea- sured |
Model accuracy |
Effi- ciency |
| 1024x1024 |
27.6M |
2.92M |
0.95 |
27.9% |
32.6% |
26.9% |
12.6% |
126 |
83.12 |
73.04 |
87.87% |
35.7% |
| 4096x4096 |
918.0M |
1.51G |
0.61 |
29.0% |
56.7% |
10.8% |
3.4% |
126 |
160 |
155.5 |
97.2% |
75.9% |
Notice the model accuracy is 97.2% and the efficiency is 75.9% of the theorectical Peak GFLOPS performance. Doing the math brings down the GFLOPS in a contolled environment test to 150 odd GFLOPS. Take away 1 SPE which is not used for games and another which is dorment for OS and we are closer to 120 GFLOPS. Again 120 GFLOPS is in a controlled environment and not in an unstable game code environment.
This is directly from the link you posted in this thread. These figures will never change, unless Sony changes the CPU in the PS3. And that doesn't happen to consoles. Dont get me wrong both consoles are powerful but not nearly as powerful as PR BS portrays. It's like TRE. You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell.
Likewise the same is said for 360. There is just a different set of boundaries for that machine.
|
an no my example is not false..reason being IBM created the chips they have done real world test's. does it mean the performance of engines on the xbox360 cannot attain higher performance. no it does not but on the same token that same thing can be said about the PS3..its the software engine, the technology it is just a rough guide what could be able to attain. will they reach their maximum's ..prob not . but it does not take away from the fact that IBM stated what they did due to the test's they have done on both systems processor's.
once again the API's was not as well developed in 2005, because the Cell was just unvailed in 2004 that's like saying the engine will stay the same for any processor..yea unless you tweak the engine which we all know EPIC, or other developer's do not tweak their engine's ..come on man your reachin.
example:
you just stated this:
@selnor
"You cannot forget when Cell is doing Graphics it CAN'T be doing normal CPU work. So devs have to be careful how much CPU work time they take away from Cell."
yes it can..and that's where you are going about it all wrong.
"staude" pointed this very same thing out to you in this very same thread. what you relate to PC programming is not the way you look at these type's of local store embedded cpu core's and how they can be developed on. "you can" but your result's will be reduced . this is more about memory management than about trying to rely on just a large pool of ram to do everything from. it's a much more indepth precise way of development with this type of design. because there is more seperate core's to manage not just on memory, but also what each core will need to do in any clock cycle.
but that also has problems of it's own:
ever heard of
"differential signaling"
example :
The realism of today’s games, though, demands far more number crunching than the CPU alone can deliver. That’s where the graphics processing unit, or GPU, comes in. Every time an object has to be rendered on screen, the CPU sends information about that object to the GPU, which then performs more specialized types of calculations to create the volume, motion, and lighting of the object.
But despite churning through billions of floating-point math operations per second, or flops, today’s gaming systems and PCs still can’t deliver the realism that game developers seek. CPUs, memories, and GPUs just aren’t powerful enough—or can’t exchange data fast enough—to handle the complexity and richness of the games designers want to create. In other words, hardware constraints force them to reduce the number of objects in scenes, their motion and texture, and the quality of special effects.
The need for speed becomes even more critical in the PS3, whose Cell processor is actually nine processors on a single silicon die. In the Cell, one processor, or core, divides up work for the other eight cores, which were designed to stream through computation-intensive workloads like video processing, content decryption, and physics calculations. [For more on the Cell chip, see “Multimedia Monster,” IEEE Spectrum, January 2006.] Using all its cores, the 3.2-gigahertz Cell processor can deliver a whopping 192 billion single-precision flops. Without a speedy connection to the PS3’s memory, the Cell starves for data.
To speed up data transfers between the Cell processor and its memory chips, the PS3’s designers adopted a novel memory system architecture that, Rambus says, addresses some of the limitations of current DRAMs [see “How the PlayStation 3 Shuttles Bits” To understand how these limitations came about, consider first the co-evolution of microprocessors and memory.
Moore’s Law tells us that transistor densities on chips are doubling every 18 months or so. This evolution has been accompanied by a doubling, on a similar time scale, in the clock rates of processor chips, basically because smaller transistors can toggle on and off faster. But memory clock rates, which serve as an indicator of memory data-transfer rates, are doubling much more slowly—about every 10 years. The result is that memory can’t fetch data to the processor fast enough, a bandwidth bottleneck that has increasingly constricted over the past few decades.
The bandwidth gap is just part of the problem. The other part is related to latency, the time it takes the memory to produce a chunk of data requested by the processor. This delay can vary from tens to hundreds of nanoseconds. That may not seem like much, but in a mere 50 nanoseconds a 3.8-GHz processor can go through 190 clock cycles. “You don’t want the processor waiting for that long,” says Brian T. Davis, a professor of electrical and computer engineering at the Michigan Technological University, in Houghton. The latency problem prompted chip makers years ago to embed some DRAM caches directly onto CPU chips, as well as to concoct some processing tricks to keep the wait for data as short as possible. Despite these improvements, modern CPUs can spend more than half their time—and often much more, Davis notes—just waiting for data to come from memory.
the PS3, the Cell and the RSX are connected by a Rambus interface technology, which, sure enough, Rambus has given a catchy name—FlexIO. The total bus width between the two chips is 7 bytes: 4 bytes to move data from the Cell to the RSX, 3 to move data in the opposite direction. This setting gives a bandwidth of 20 GB/s outbound from the Cell and 15 GB/s inbound—almost 10 times as fast as PCI Express, an interface standard popular in today’s PCs. Thanks to FlexIO, the Cell processor can fling an incredibly large number of triangles to the RSX, and the result is more details and more objects in 3-D games.
Future Gaming consoles will continue to demand ever-faster buses, but how much bandwidth is enough will vary from system to system. For instance, one of PlayStation 3’s main competitors, Microsoft’s Xbox 360, released last year, relies on a CPU-GPU bus with a bandwidth of 21.6 GB/s, half in each direction. It’s a proprietary interface developed by IBM that runs at 5.4 GHz and relies on differential signaling to maintain signal integrity. It may not be as fast as PS3’s, but Xbox 360 owners don’t seem disappointed.
In fact, just because PS3 has more powerful data pipes, that doesn’t mean its games will easily get the full benefit from them. As in any other computer system, software, not just hardware, matters. Game developers will have to design their code carefully to make sure that the Cell is getting the types of workloads for which it works best, and that data are streaming smoothly between processor, memory, and GPU.
so as you can see , it not just about the Cell, or the xenon or their respective GPU's its about the system as a whole and your article failed to even go into that part of his take on each system. Now like i said it's not about his take it's about its Relevance today.
http://www.spectrum.ieee.org/images/...ges/gamef1.pdf
|
I agree with you 100% here. It is all about the whole system which is exactly why they are so close in overall ability.
The cell helps out the RSX alot as the visuals get more fidelity. I didnt mean that the Cell cant do anything else if it's doing graphical calculations, but it will have less power available to do what normal CPU operations are.
Whereas the 360 has a more traditional setup, it has got a vastly more powerful GPU. So you will never need to have a CPU do graphical calculations to get the same game from 360. In fact the Xenos GPU will take awhile to learn to programme right as it is vastly new and different than normal GPU's. ( Or certainly was in 2006 ). Not only is it's architecture completely redefining GPU's but ATI also had a hand in designing the overall memory of the 360.
Once you learn that the Xbox 360 GPU also acts as the system’s memory controller, much like the Northbridge in an Intel PC, the picture becomes a bit clearer. ATI has been making and designing chipsets for a good while now that use GDDR3 RAM. Add to this that Joe Macri (go cart racing fiend extraordinaire), who was a pivotal factor in defining the GDDR3 RAM specification at JEDEC and is also a big fish at ATI, and it only makes sense that ATI could possibly put together one of the best GDDR3 memory controllers in the world. So while it might seem odd that the Xbox 360 Power PC processor is using “graphics” memory for its main system memory and a “GPU” as the “northbridge,” once you see the relationship between the three and the technology being used it is quite simple. Therefore, we have the 700MHz GDDR3 RAM acting as both system RAM and as GPU RAM, connected to the GPU via a traditional GDDR3 bus interface that can channel an amazing 25 Gigabytes per second of data.
Smart 3D Memory is the biggest standout and innovative feature I saw inside the entire Xbox 360. To give you an idea of what it would look like first hand, think of any normal GPU you might see, something much like this Mobility Radeon X700 chipset. The X700 is pretty much what any modern GPU looks like. Now think of that same chipset as having a single piece of DRAM sitting off to one side, much like you can see in this ATI slide below, but with one less piece of RAM (and no arrows).

Keep in mind, ATI is not a stranger to adding memory to a chipset, but remember that this is “smart” memory.
The Xbox 360 Smart 3D Memory is a relatively small piece of DRAM sitting off to the side of the GPU but yet on the same substrate. The Smart 3D Memory weighs in at only 10MB. Now the first thing that you might think is, “Well what the hell good is 10MB in the world of 512MB frame buffers?” And that would be a good line of questioning. The “small” 10MB of Smart 3D memory that is currently being built by NEC will have an effective bus rate between it and the GPU of 2GHz. This is of course over 3X faster that what we see on the high end of RAM today.
Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM. This logic unit will be able to exchange data with the 10MB of RAM at an incredible rate of 2 Terabits per second. So while we do not have a lot of RAM, we have a memory unit that is extremely capable in terms of handling mass amounts of data extremely quickly. The most incredible feature that this Smart 3D Memory will deliver is “antialiasing for free” done inside the Smart 3D RAM at High Definition levels of resolution. (For more of just what HiDef specs are, you can read about it here. Yes, the 10MB of Smart 3D Memory can do 4X Multisampling Antialiasing at or above 1280x720 resolution without impacting the GPU. Therefore, not only will all of your games on Xbox 360 be in High Definition, but they also will have 4XAA applied.
The Smart 3D Memory can also compute Z depths, occlusion culling, and also does a very good job at figuring stencil shadows. Stencil shadows are used in games that will use the DOOM 3 engine such as Quake 4 and Prey.
Now remember that all of these operations are taking place on the Smart 3D Memory so they will have very little impact on the workload GPU itself. You may now be asking yourself what exactly the GPU will be doing.
First off, we reported on page 2 in our chart that the capable “Shader Performance” of the Xbox 360 GPU is 48 billion shader operations per second. While that is what Microsoft told us, Mr. Feldstein of ATI let us know that the Xbox 360 GPU is capable of doing two of those shaders per cycle. So yes, if programmed correctly, the Xbox 360 GPU is capable of 96 billion shader operations per second. Compare this with ATI’s current PC add-in flagship card and the Xbox 360 more than doubles its abilities.
Now that we see a tremendous amount of raw shader horsepower, we have to take into account that there are two different kinds of shader operations that can be programmed by content developers. These are the vertex and pixels shaders. These are really just what they sound like. Vertex shader operations are used to move vertices, which shape polygons, which make up most objects you, see in your game, like characters, buildings, or vehicles. Pixel shader operations dictate what groups of pixels do like bodies of water or clouds in the sky, or maybe a layer of smoke or haze.
In today’s world of shader hardware, we have traditionally had one hardware unit to do pixel shaders and one hardware unit to do vertex shaders. The Xbox 360 GPU breaks new ground in that the hardware shader units are intelligent as well. Very simply, the Xbox 360 hardware shader units can do either vertex or pixel shaders quickly and efficiently. Just think of the Xbox 360 shaders as being analogous to SIMD shader units (Single Instructions carried out on Multiple Data).
The advantage of this would not be a big deal if every game were split 50/50 in terms of pixel and vertex shaders. That is not the case though. While most games are vertex shader bottlenecked, some others are pixel shader bottlenecked this is impossible to get bottlenecked on the Xbox 360.
So although they both have different methods, you end up with a very close overall capability. And this is in a large way down to ATI's influence on memory and brand new tech in the GPU.
|