Oh god, here we go again. This entire thread is full of half truths and outright lies.
Game_boy said:
To take advantage of this performance, developers would always have to find nine largely independent simultaneous tasks, eight of them primarily floating-point. This is very hard in games, because everything a game does is usually interlinked and branching (if statements), and to add to the difficulty a programmer must deal with asymmetric cores, i.e. SPE instructions are different from general-purpose ones so they can't shuffle tasks automatically between cores and expect similar results. Rather than spending potentially months optimising and separating the code for speed, many big-budget studios opt to just use the single general-purpose core for most of the game. This cuts out all of the advantages the CBEA in the PS3 had because it also stresses the small amount of RAM in the PS3 and forces graphics work to be offloaded to the GPU.
Wrong. Developers have to think about dividing up the tasks currently performed into the 8 available separate threads. They don't have to be separate tasks, but can be chunks of larger tasks. Let's say you need to rotate every object on the screen. Normally, a developer will send that task to one loop on one thread and do something else on a new thread.... (actually, this is sometimes done using a single OpenGL call to change the state of the environment before inserting more objects, but this type of thing can be offloaded and done before the video car gets the data... ) And this "old" thought process you are utilizing. The bandwidth between the cores is sufficient enough to split those rotations up between all the cores and process up to 7 times more objects than you could with one core. The Cell excells at this type of thing since it works well with floating point numbers and vector translation is a highly floating point intensive calculation.
Also, everything in gaming is not "often interlinked and branching (if statements)" That's a very simplistic view of programming in general. In programming today, if you actually knew what you were talking about, most operations are performed in "states". Especially in multi-threaded applications that are written well. The tasks assigned to the cores will be predetermined and assigned by the PPE. Generally "today's programmers" (or more notably those that can't seem to wrap their grubby claws around the idea of multiprocessing in the first place and rely on Mutex locking and the like to make threading easier) don't "get" SMP programming and consider it too hard.
Game_boy said:
The PS3 GPU is an older core design than the Xbox 360: it does not have unified shaders or many improvements modern PC GPUs have because it was not expected to do most of the floating-point work. Also, CPU-GPU bandwidth is lower than expected because the PS3 is designed for fast communication between processor cores: the CBEA's internal data transfer rate is very high. All of these factors make the PS3 function like a "normal" console, which is easy for the developers but ignores all of the PS3's potential.
Unified shaders are mainly "helpful" in DirectX applications where shaders are usually handled through a few select functions. OpenGL shader programming is quite advanced and specialized shaders can actually make the scene appear much more realistic.
And your completely wrong on the GPU doing most of the floating point operations. That's what GPUs are designed exclusively for. This is why they have GDDR3 memory now and rely on high bandwidth memory and multiple processing units. The bandwidth between the Cell and RSX is almost twice that of a standard PC today. Not to mention the Cell system memory is "at clock speed" memory meaning that in one clock cycle it can read or write to the memory instead of having to wait on it like processors today. What does this help the Cell architecture do? Process small chunks of data extremely fast. Remember what I said about doing a transform for all the objects on the screen at one time? That is a tiny operation and preforms VERY well in the Cell without tying up resources on memory waits.
Game_boy said:
In contrast, though Xbox 360 uses the POWER architecture too, it's "Xenon" CPU has three general-purpose symmetric cores. It is easier for programmers to think of three general purpose tasks and so a lot more of the CPU's potential is used. Due to the general-purpose design of "Xenon", Microsoft expected all graphics work to go on the GPU and used a very advanced design which was only even available for PCs in April this year. It has high memory bandwidth and unified shaders which make it a very powerful GPU. Finally, because the Xbox 360 was the only seventh-generation console on the market for a year, developers have had much more exclusive time with it to understand how best to use it.
Yes, and no. The "typical" structure of the 360 is more generic. It's good at taking a poor programmer's code and making it work. Microsoft has been doing this for a long time with their Visual Studio applications and even now .NET. Hell, a certain monkey like high ranking official even touted making the developer's lives easier in a recent conference. While it may sound great, it's bad for many... many more reasons than good. Developers don't even think about the processors anymore. Why is this bad? Because bad code doesn't show up in these situations. Coders that don't understand what the processor is doing and will blindly put in logic that make the processors perform many more tasks than are needed to perform a certain task. Ask your local developer what the fastest way is to multiply a number by 16. You'll get two answers. One is fast, and the other is lazy (or uneducated.)
The reason the "very advanced design" GPU had to be implemented was for pretty much the exact same reason as I stated about CPU programmers. Microsoft likes making the work of the developer easy. As a developer I like, and dislike this, for many reasons. One of which is bad code, the other is cheap developers. And the "Memory bandwidth" difference you speak of is minimal. The only significant bandwidth is used for after effects including a few processes like Anti-Aliasing on the GPU in a very finite amount (10MB) of high performance memory. But due to it's nature, it's very restricted on what it can do (and dealing with the post rendered image is about it...)
Game_boy said:
What all of this means is that while in theory the PS3's power exceeds the Xbox 360, the Xbox 360's layout was easier for developers and to use, it is suited more to game-like tasks, and they are more familiar with it due to historical context and a year's exclusive attention. The PS3 is not lacking potential, and we can expect great games for it in the future, but for now a lot of games on the Xbox 360 will appear to be faster, smoother and more graphically impressive than their PS3 counterparts.
I pretty much already explained why this was above, but since you restated it, I felt the need to restate and point it out. It has to do with the "singular process" logic and developers too lazy (or uneducated) to think about the processes they perform as it applies to the processor and the architecture.
Game_boy said:
A warning: Do not judge ANY console by its "numbers" alone. The 3.2GHz number means very little when comparing consoles to consoles or PCs. The Wii is more powerful than its 729Mhz number suggests. 256MB of memory can be a lot or a little in different contexts. These numbers imply operating speed or operating capacity, not both or the speeds which connect the components together. I can have one transistor running at 100GHz and it'll be less powerful than a "3.2GHz" CPU.
A warning: Do not judge any console by the rantings of an individual on a gaming forum who claims that a 729Mhz will be able to compete with a 3.2Ghz core. Especially if they proclaim that 256MB can be a lot for their favorite console, but a hindrance for a console that they don't favor in their long post. (Ignoring the fact that that same console could read and write twice as much ore more data in the same amount of time) There are certain facts that people should not be proclaiming form the rooftops.
Ishy said:
the thing with the cell is that it wasn't originally designed with videogames in mind so it isn't well adapted for it's purpose meaning it's harder for developers to unlock it's high levels of power...
Wrong. The Cell was designed for multimedia processing. A game is a multimedia program in every regard. The Cell is meant as a branch to multi-processing and a break form the "normal" singular thought process behind programming. This has everything to do with why developers are "having a hard time" with it. While multiple processors and multi-core machines have been around for a long time, they haven't been so in gaming. IBM saw this as an opportunity to show waht multimedia could be and the Cell is actually a very good processor at what it does and is supposed to do. Unfortunately, as I stated earlier, it's not what the developers are used to right now, and many are having a hard time learning this new side of development (which they were not trained for coming out of school.) Multiple general purpose processors are easier to pick up on as they all perform the same as a single core. However, they will hit a performance wall quicker than a specialized core.
fazz said:
Oh, and you forgot that the Cell has 512KB L2 Cache and Xenos has 1024KB.
That means almost nothing when the system memory is running at full clock speed and each SPE has a dedicated 256K block of memory to do it's calculations in. In fact, it tells me that the PPE on the Cell has more dedicated cache than one core of the Xenon which has to divide its cache between all three cores. (1024/3 = ~341K vs 512K dedicated cache)
My real question is this though: If you proclaim that numbers don't mean anything, why did you even start this thread?