It's time to go back to play games guys :)


| HappySqurriel said: Sorry XoJ, what you said is mostly false or misleading ... Most of the real time ray-tracing demos for the PS3 involve multiple PS3s networked together and get worse performance than most single or multi-GPU real time ray-tracers that have been produced since the PS3 was released. Sony did originally play to have 3 Cell processors in the PS3 but they scrapped that when they realized that a (comparatively) cheap off the shelf GPU would provide better performance than 2 Cell processors dedicated to graphics. Game developers do focus on middleware rather than implement large peices of code, but in most cases the middleware gives you access to the source code allowing you to make huge modifications if you decided to. On top of this, companies like Epic, id and Crytec would port their engines to the cell; and it is likely that Sony would port popular APIs (like OpenGL) to run on the Cell processor in order to make porting easy. Thirdly, all major game engines are highly optimized code bases that were developed by the most experienced (and some would say best) game programmers in the world. Certainly, it is possible for really good developers to develop a higher performance engine that Epic, id, Crytec or Valve if they were to focus on developing it specifically for the PS3; but you're talking about a 5% to 10% boost in performance at a cost of tens of millions of dollars after you have implemented the engine and all of the tools (like importers and level editors) which are the primary reason people moved over to licenced engines.
|
u just shown the video so people could see whats was possible using the cell broadband engine, they do use cluster but they explain that in the video, its
the rest to paragraph you are practically agreeing with me than a gpu would be better in graphics than 2 cells, even if sony ported the engine it's clear they would run better into a GPU than a CPU.
The Cell is as much as GPU as a weekend golfer is to a professional. When talking about the ability to play, you can tick the box for both people. However when talking about whos going to compete in the U.S. Masters there is only one answer.
Tease.
Alright, I'll bite.
The fact that the OP doesn't understand enough about computing to really analyze the differences between the Cell and the Xenon has already been made clear, so...
The Cell has 8 cores, yes. Actually most PS3 Cells have 9 cores, but one is disabled at the factory. The PS3 Cell needs to have 1 PPE core, and 7 SPE cores to qualify as functional. This was done because early Cell yields were so low (with 8 SPEs), that the PS3 "version" of the Cell had to require 1 less SPE to be affordable at the time. Within a year, yields were excellent, and this problem would have gone away... but the PS3 spec is set in stone by the limitations of some of those early machines. No game could ever use the 9th core, even if it were enabled, because then there would be PS3s that exist which couldn't run said game.
One of the cores on the PS3 Cell is called the "PPE", and except for the way it does threading, is it basically identical to any one of the Xenon's cores. (see threading note, below)
The other seven cores of the PS3 Cell are "SPEs" -- fully functional processors which have some unusual "external" architectural limitations, such that they can better serve their "purpose". The "purpose" being that they are as fast as possible, and as cheap to build as possible, at the same time (obviously you need to balance these goals).
To accomplish the goal of being relatively cheap to manufacture. None of the logic intensive, programmer convenience stuff, like branch prediction and out-of-order execution are on the SPE cores-- the PPE and the Xenon cores also have this "flaw" (meaning much reduced cost, and small performance loss too), although they do have very simple (almost worthless) branch predictors.
Here's the crux of the performance issue, outside of the number of cores. The three cores of the Xenon share something called a "cache". Caches serve to speed up computing, by keeping stuff the processor is working on, very nearby (in cache memory), such that it can be read/written to very quickly. This is the MAIN bottleneck for performance on modern computers, on which the clock speed of the processor has gotten very fast, but memory latency (how many of those 3.2 billion cycles/sec it takes to access data from memory) has not improved at nearly the same rate.
Not having a cache would be akin to you (being the processor), doing math homework in the following way: Rather than having your notepad in front of you, at your desk (i.e. having a cache), you instead have your notebook in the basement, stuffed in some boxes which you never unpacked from the last time you moved. Each time you figure out a math problem from a worksheet, instead of writing the answer down on the pad of paper in front of you, you need to run downstairs, dig through your moving boxes, write in the notebook, repack it, and then run back upstairs to read the next problem.
The cache mirrors main memory to accomplish this speedyness -- in other words, the notepad pages copy *themselves* on a much larger notepad you have downstairs, whenever you fill up a page, and also you have a second notepad that shows you a few of the next problems, from a textbook also stored downstairs (this is called the "instruction cache"), and each time you move to a new set of problems, the cache notepad does the legwork of running downstairs and copying the next few problems and brings them back to you.
The speedy part comes in, in that the cache can do this running back and forth WHILE you are working out the problems!
Some 20-40% of a processor core's time is typically spent waiting on the cache to retrieve stuff from main memory (in a console game)... but that's alot better than the ~95% (or more) of the time it would take without a cache.
The PPU has its own cache, all to itself (the "PPE" is the PPU + cache + altivec math unit, etc). The SPEs have something called "localstore" which is as fast as "level 2" cache memory, but works more like an independant memory system for that SPE only. The cache differences are absolutely critical to understanding the true difference between the Cell and Xenon, when it comes to processing power.
Here's the rub. The Xenon's three cores share a cache. The Cell's PPU has the cache all to itself. The Cell's SPEs all, effectively, have their own caches as well, but those "caches" must be operated manually (which is the REAL hurdle when it comes to writing programs on the Cell). Manual management of this pseudo-cache memory is a tad difficult to manage, but like a manual transmission sportscar, when done right, its much more efficient.
Even more important, is the sharing issue, which the Xenon is plagued with, but the Cell has no issues with. The Xenon has 3 cores (we won't even go into the 2 threads per core thing), which, if accessing memory in a similar pattern, basically clobber the work of each other, and slow each other down. Its like the notepads in the above example running into one another, and each time this happens, all but one notepad has to return to your desk, without any new info, and start the trip all over again.
If the 3 Xenon cores are doing things which are, more or less, disimilar, then the collision problem becomes much less of a big deal. Unfortunately, that typically means that really optimized X360 games are running one "heavy" thread on each core at any one time, and one lighter thread. Heavy threads might include: game logic, animation, physics, AI. Light threads might include: sound processing, streaming, input processing, OS work, etc. The trouble is that the heavy threads are often dependant on running in order during a single game frame, and thus, they cannot run at the same time... First AI, then Game Logic, then Physics, and lastly Animation, for example. No point in running them on multiple cores, since they are each dependant on one of the other's results (though many games will delay stuff by a frame or two, to run some stuff in parallel). Thus the Xenon basically fails in the parallelism dept. One main thread, and 4-5 lightweight threads is about it, and outside of the core running the main thread, the other cores aren't very well utilized, in many cases.
The Cell, on the other hand, cleans house, when it comes to parallelism. Say you have 200 characters to animate in a single scene. They are not co-dependant... guess what makes for a highly math-intensive, parallel task? How about vertex skinning those 200 animated characters? Sure, you can waste some of your flexible, parallel GPU pipelines doing this.. but.. that hurts your pixel pipelines afterwards, doesn't it? How about culling objects out of the scene? Physics raycasts? All of them easily made parallel en masse, if your processor can do parallel work easily. All of them hideously expensive math ops, too.
Potential End Result: Cell kicks ass, for price (now that the yield is high, which was the only reason it was ever expensive to begin with), and for performance. Also, you have to understand processors pretty damn well to utilize it properly... typically the guys who understand that... cost a ton of money.
Actual End Result: Cell is hard for game devs to fully grasp/understand (at least early in the generation), and X360 is easy, plus the 360 GPU is roX0rz mega tech, for the time, and takes up a lot of the slack that the Xenon leaves behind.
Eventually, all high-performance processors will be Cell-like, because parallelism is king, when it comes to performance, and sharing resources (like the cache), just doesn't cut it for many applications -- games included.
And yes, I respect the Cell for its awesomeness, and the Xenon sucks rocks by comparison. Then again, I am obviously versed in the details of processing, so its awesomeness isn't lost on me.
The X360 GPU (the Xenos), on the other hand, and much like the Cell, started a revolution, and its a shame that the X360 has to be burdened with the, IMO, cruddy Xenon.
Future console CPUs: Will be like the Cell, or the Wii.
Future console GPUs: Will be like the Xenos, or the Wii.
Being Wii-like looking very attractive right now, I'll wager.
^ Great post by Procrastinato. In fact, I'm probably bookmarking it. Sadly, this thread has moved on to jeorc's personal issues/agenda/grammar.
Tag (courtesy of fkusumot): "Please feel free -- nay, I encourage you -- to offer rebuttal."
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
My advice to fanboys: Brag about stuff that's true, not about stuff that's false. Predict stuff that's likely, not stuff that's unlikely. You will be happier, and we will be happier.
"Everyone is entitled to his own opinion, but not his own facts." - Sen. Pat Moynihan
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The old smileys:
; - ) : - ) : - ( : - P : - D : - # ( c ) ( k ) ( y ) If anyone knows the shortcut for
, let me know!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I have the most epic death scene ever in VGChartz Mafia. Thanks WordsofWisdom!
@Lazy man
I don't think that the Cell is going to start any revolutions. I suspect if theres any move to asynchronous computing it will be a CPU + GPU model, theres no point in duplicating the same compute models on both the CPU and GPU.
I noted also how you didn't include memexport or any of the other useful Xbox 360 functionality?
Tease.
I dont know why I get into these threads when I cant understand anything xD
- Our album on spotify https://open.spotify.com/album/56mEbEgyBYGzcDyZ1eMQ1v?si=hYKgir5YRSCrzywgGmV4oQ
- Our videoclip
- My manga: https://www.webtoons.com/en/challenge/blanca-the-world/list?title_no=313068
| pastro243 said: I dont know why I get into these threads when I cant understand anything xD |
Sorry, I tried to simplify it with the notebook explanation, but I should have done that with the whole post, in terms of parallelism, as well.
In short, the Xenon is 3 (to 6) pens, all having to share the same notebook. The notebook fills up pretty fast when doing problems of the same type (actually its always full, and you're just replacing stuff, line by line, and the real problem is other pens trying to write on a line that *looks* like yours, but isn't... okay, that was unnecessary detail, just ignore that), and thus it has to run to the basement a lot, and you often end up waiting on it.
The Cell has 8 or 9 pens writing into 8 (albeit smaller) notebooks (well actually 7 or 8, into 7, in practice), and on top of that, all but one of those notebooks can be used in such a way such that you never, ever have to wait on it returning from your basement with new info. The PPE processor can, and often does have two threads running, but usually they aren't similar, in terms of cache coherency, so they don't clobber the shared notebook much.
Most games have large portions which are inherently parallel... hence the innate advantage of the Cell in those games, in theory. The theory, sadly, involves having skill and time to develop being relatively unconstrained -- and that's the PS3's biggest downfall. Until developer experience makes the skill/time problem a lesser issue, the business of making games on time will tend to favor the 360.
Hence, typical crossplat 360 > PS3 issues, and because experience lessens the difference, from a business perspective, the difference in crossplat performance and quality has also decreased over time. The fundamental differences between the two consoles basically means that crossplats will often tend to be similar or higher quality on the 360, and exclusives will almost always be better on the PS3. 360 fans hate to hear "PS3 exclsuives are better", and PS3 fans hate to hear "360 crossplats are better", but the fact of the matter is that, at the heart of the issue is hardware that makes both of those statements tend to be true in the end. The differences are not significant enough to really matter, IMO, though. Both consoles have a crippling member in their CPU/GPU pair, that basically brings them to the same level, as in one case, the awesome CPU must support the GPU, and in the other, the awesome GPU must support the CPU.
@Squilliam: You realize that GPUs are basically Cell-esque processors, right, only without any sort of cache whatsoever? If anything, GPUs will evolve toward the Cell, if they are to accomplish anything serious outside of the embarrassingly parallel task of rendering. All GPUs are, are a load of specialized CISC instructions, in a large number of pipelines that are not co-dependant (as processor pipes usually have to be). Imagining that the Cell can evolve to have dozens of SPEs, and do the work of a GPU is reasonable. Imagining that GPUs can accomplish the work of the Cell, without a large caching mechanism, or a dependant ordering in their instruction pipeline, is folly. GPUs are specialized hardware. Removing that specialization only turns them into a processor much more like the Cell.
Procrastinato said:
@Squilliam: You realize that GPUs are basically Cell-esque processors, right, only without any sort of cache whatsoever? If anything, GPUs will evolve toward the Cell, if they are to accomplish anything serious outside of the embarrassingly parallel task of rendering. All GPUs are, are a load of specialized CISC instructions, in a large number of pipelines that are not co-dependant (as processor pipes usually have to be). Imagining that the Cell can evolve to have dozens of SPEs, and do the work of a GPU is reasonable. Imagining that GPUs can accomplish the work of the Cell, without a large caching mechanism, or a dependant ordering in their instruction pipeline, is folly. GPUs are specialized hardware. Removing that specialization only turns them into a processor much more like the Cell. |
Actually IIRC the level of cache in a modern GPU is measured in the hundred of kilobytes at least. The real difference is in cache coherency of which there isn't much. A GPU is divided into processing units which are again further divided. X number of ALUs -> X number of texturing units -> X number raster units along with the required level of cache = one processing unit. Each processing unit can work on one thread. I believe that AMD groups their ALUs into 8s with one left over for redundancy.
I really do doubt that long term the Cell architecture is going to recieve nearly the same level of investment as the GPU architectures from Nvidia, AMD and maybe Intel. This is both in hardware R+D and software development and the GPU does pose a serious threat to many of the markets where the Cell processor excels in. Once every computer has a CPU with a programmable DX11+ GPU attached its likely game over, and at present its another X86 situation in the marketplace. Also its doubtful that the Cell can achieve nearly the same level of performance per mm^2 or watt as a GPU in GPU specialised tasks in the future.
Tease.