By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming - Xenon vs Cell Which one really is better ??

NNN2004 said:
but i wonder ... why people talk about Cell power more than xenon power ???

 

It's a traditional marketing practice of the electronics industry at which Sony excels.

It works like this:

a) come up with a catchy name for product features/components (Cell)

b) spend marketing dollars and attach an emotional value to the buzzword (Power, Hope, Potential)

c) at this point a label with the buzzword on a product becomes a selling point and consumers use it as an argument in itself (Yo dude, this has teh Cell in it, it's so powerful they can't harness it's raw power!).

Examples: MegaBass, BrightEra, ClearVID, etc. Do you know what those words really mean? what's behind them? not many do and even less care. But to many consumers, they are a selling point.





Current-gen game collection uploaded on the profile, full of win and good games; also most of my PC games. Lucasfilm Games/LucasArts 1982-2008 (Requiescat In Pace).

Around the Network
forevercloud3000 said:
the only real answer is the cell, by a sizable margin. The CELL is used n the worlds strongest super computer, xenon is not, enough said. it doesnt depend on anything other then effort put in to both. Yet the CELL still goes fartger.

The reason the Cell is used in those ways is becuase Sony does not fully own the Cell...While Microsoft Fully owns the Xenon and was made to power the Xbox 360



 



@Everyone who is talking about SPE's not liking branchy code

The issue is data parallelism (SIMD) not instruction parallelism (threads)

The reason people have difficultly getting decent performance out of the CBE is that compilers usually just give up when they are presented with a branch in an inner loop. Eg. the compiler has no problem vectorizing this

float a[N], b[N], c[N];

for (i = 0; i < N; i++)

   a = b + c;

into something like this (going to use SSE intrinsics because I know most here -- that are actually programmers -- are more familiar with x86, but altivec works the same way)

__m128 *av, *bv, *cv;

av = (__m128*)a; // assume 16-byte aligned, but all u need to do is allocate with declspec or aligned malloc

bv = (__m128*)b;

cv = (__m128*)c;

for (i = 0; i < N/4; i++)

   av = _mm_add_ps(bv, cv);


But, every compiler I know of will just give up on this


for (i = 0; i < N; i++)
   if (a > 0)
     a = b / c;

However, if the programmer knows what he is doing he can eliminate the branch by via logical operations:

__m128 *av = (__m128*)a;
__m128 *bv = (__m128*)b;
__m128 *cv = (__m128*)c;
__m128 zeros = _mm_setzero_ps();
for (i = 0; i < N/4; i++)
{
   __m128 x = _mm_div_ps(bv, cv);
   __m128 g = _mm_cmplt_ps(av, zeros);
   __m128 y = _mm_andnot_ps(g, av);
   __m128 z = _mm_and_ps(g, x);
   av = _mm_or_ps(y, z);
}

Now, on most intel machines you are highly constrained by register pressure because the machines only have 8 simd registers making vectorization impractical for large loops without heavy amounts of loop fission which may not be possible. The SPEs on the other hand have 128 SIMD registers per-core which for the applications I've developed, register pressure was a non-factor for traditional loop vectorization techniques. And the stuff I've been working on atm is quite branchy.

(PS there may be some errors as I did this quickly, but the general principle is correct)

 



Broadway.



@Alephnul, I've heard of that exact frustration with the PS3s compiler before. Is that still the case?

Anyway, so you're going to spill the beans on the differences/similarities between the VMX units in the Xbox 360 to the SP units in the Cell?



Tease.

Around the Network
alephnull said:

@Everyone who is talking about SPE's not liking branchy code

The issue is data parallelism (SIMD) not instruction parallelism (threads)

The reason people have difficultly getting decent performance out of the CBE is that compilers usually just give up when they are presented with a branch in an inner loop. Eg. the compiler has no problem vectorizing this

float a[N], b[N], c[N];

for (i = 0; i < N; i++)

   a = b + c;

into something like this (going to use SSE intrinsics because I know most here -- that are actually programmers -- are more familiar with x86, but altivec works the same way)

__m128 *av, *bv, *cv;

av = (__m128*)a; // assume 16-byte aligned, but all u need to do is allocate with declspec or aligned malloc

bv = (__m128*)b;

cv = (__m128*)c;

for (i = 0; i < N/4; i++)

   av = _mm_add_ps(bv, cv);


But, every compiler I know of will just give up on this


for (i = 0; i < N; i++)
   if (a > 0)
     a = b / c;

However, if the programmer knows what he is doing he can eliminate the branch by via logical operations:

__m128 *av = (__m128*)a;
__m128 *bv = (__m128*)b;
__m128 *cv = (__m128*)c;
__m128 zeros = _mm_setzero_ps();
for (i = 0; i < N/4; i++)
{
   __m128 x = _mm_div_ps(bv, cv);
   __m128 g = _mm_cmplt_ps(av, zeros);
   __m128 y = _mm_andnot_ps(g, av);
   __m128 z = _mm_and_ps(g, x);
   av = _mm_or_ps(y, z);
}

Now, on most intel machines you are highly constrained by register pressure because the machines only have 8 simd registers making vectorization impractical for large loops without heavy amounts of loop fission which may not be possible. The SPEs on the other hand have 128 SIMD registers per-core which for the applications I've developed, register pressure was a non-factor for traditional loop vectorization techniques. And the stuff I've been working on atm is quite branchy.

(PS there may be some errors as I did this quickly, but the general principle is correct)

 

So which is better?

And what does it mean the Cell has a PPE?

And since it said that the Xenon's three PPE's under a single DIE which are modified versions of the one in the Cell... what would the difference be?

And does the PS3 Cell have a different PPE?

And who the hell knows what your talking about besides whoever knows what your talking about?



 



@Squilliam I assume you are referring to VMX's altivec-lite ISA versus the SPU SIMD ISA. I have no experience with VMX, but I do have a decent amount of experience with altivec.

It is true that the SPU ISA does not support branching operations on SIMD registers, thus requiring and data to be transfered to a normal register before evaluating a conditional. But this isn't really that big of a deal. Just because an ISA makes an operation available, doesn't mean you should use it. A VMX programmer would suffer would at best suffer only slightly less from a bad prediction and possibly worse if the jump distance was short enough to be in LS of one of the SPEs. Remeber, CISC instructions are really running a large number of microcode instructions which are going to be fairly similar to the RISC instructions presented to the SPU programmer.



Shinlock said:
Apples to Oranges imo. the cpus are so different.. Xenon is easier to work with and fits into programming models adopted with PC and Wii. Cell has potentially more computation power (very powerful single precision floating point computation), can be given some of the work that the gpu would otherwise do (bone weighted animation blending which one might use vertex shaders on the gpu for).

Couldn't explain it better. It is a matter of what you are wanting the processor to do. Of course in a video game or general media sense the CELL is stronger but in a general processing environment (a normal desktop PC's OS) the Xenon would destroy the CELL.

PC gaming is better than console gaming. Always.     We are Anonymous, We are Legion    Kick-ass interview   Great Flash Series Here    Anime Ratings     Make and Play Please
Amazing discussion about being wrong
Official VGChartz Folding@Home Team #109453
 
alephnull said:

@Everyone who is talking about SPE's not liking branchy code

The issue is data parallelism (SIMD) not instruction parallelism (threads)

The reason people have difficultly getting decent performance out of the CBE is that compilers usually just give up when they are presented with a branch in an inner loop. Eg. the compiler has no problem vectorizing this

float a[N], b[N], c[N];

for (i = 0; i

   a = b + c;

into something like this (going to use SSE intrinsics because I know most here -- that are actually programmers -- are more familiar with x86, but altivec works the same way)

__m128 *av, *bv, *cv;

av = (__m128*)a; // assume 16-byte aligned, but all u need to do is allocate with declspec or aligned malloc

bv = (__m128*)b;

cv = (__m128*)c;

for (i = 0; i

   av = _mm_add_ps(bv, cv);


But, every compiler I know of will just give up on this


for (i = 0; i    if (a > 0)
     a = b / c;

However, if the programmer knows what he is doing he can eliminate the branch by via logical operations:

__m128 *av = (__m128*)a;
__m128 *bv = (__m128*)b;
__m128 *cv = (__m128*)c;
__m128 zeros = _mm_setzero_ps();
for (i = 0; i {
   __m128 x = _mm_div_ps(bv, cv);
   __m128 g = _mm_cmplt_ps(av, zeros);
   __m128 y = _mm_andnot_ps(g, av);
   __m128 z = _mm_and_ps(g, x);
   av = _mm_or_ps(y, z);
}

Now, on most intel machines you are highly constrained by register pressure because the machines only have 8 simd registers making vectorization impractical for large loops without heavy amounts of loop fission which may not be possible. The SPEs on the other hand have 128 SIMD registers per-core which for the applications I've developed, register pressure was a non-factor for traditional loop vectorization techniques. And the stuff I've been working on atm is quite branchy.

(PS there may be some errors as I did this quickly, but the general principle is correct)

 


omfg, I can't believe I actually could follow that somewhat. I haven't programmed in ages and only stopped at the basics C++

PC gaming is better than console gaming. Always.     We are Anonymous, We are Legion    Kick-ass interview   Great Flash Series Here    Anime Ratings     Make and Play Please
Amazing discussion about being wrong
Official VGChartz Folding@Home Team #109453
 

i think the world may never know