Linux: PS3s Cell is faster than i7 965 XE

dahuman

Currently Offline

38,514

9881 posts since 18/01/09

Recent Badges:

Watch Your Back! Received 10,000 profile views.
Genocide 5,000 posts on the gamrConnect forums.
1st Birthday Has been a VGChartz member for over 1 year.
13 Years Has been a VGChartz member for over 13 years.
Open For Business Earned 10 badges.
9 Years Has been a VGChartz member for over 9 years.

dahuman on 27 May 2009

nen-suer said:
@dahuman

Cell is real, while sayians aren't XD

I still pwned your imageZ!!

View Post

Around the Network

kars

Currently Offline

265

215 posts since 04/03/07

Recent Badges:

2 Years Has been a VGChartz member for over 2 years.
Leaving Limbo 100 posts on the gamrConnect forums.
3 Years Has been a VGChartz member for over 3 years.
So You Came Back For More, Huh? Logged in a second time.
14 Years Has been a VGChartz member for over 14 years.
Site Veteran Has been a VGChartz member for over 5 years.

kars on 27 May 2009

Carl2291 said:

Was that a serious reply to me saying that the Cell is Skynet?

No It was a serious reaction to you claim, that it would be damage control. If you would really want to see damage control, simply look at the PS-3. The Cell itself was developped for a quite different game machine but Sony was uncapable to develop the companion chip. The SPUs and the GPU contradict each other so Sonys original code model made no longer any sense.

View Post

kars

Currently Offline

265

215 posts since 04/03/07

Recent Badges:

1st Birthday Has been a VGChartz member for over 1 year.
13 Years Has been a VGChartz member for over 13 years.
2 Years Has been a VGChartz member for over 2 years.
Open For Business Earned 10 badges.
4 Years Has been a VGChartz member for over 4 years.
11 Years Has been a VGChartz member for over 11 years.

kars on 27 May 2009

Jo21 said:
Cell have PPE too.

that does the syncing with all the others SPUS.

Or to describe it more precisely: The PPE/PPU is the real processor optimized for for Random Access, while the SPUs are streaming processors. They depend on a single data stream that delivers everything to their local memories. If you can't work this way (depends on informations from several sources you are in deep trouble! If you use the PPE to collect this data or not, doesn't matter. The SPU can't do anything useful till its stream is ready. On the Xbox360 this doesn't happen because the architecture can do its own load balancing as every other multi core.

View Post

Jo21

Banned

7,569

5599 posts since 23/07/08

Recent Badges:

Watch Your Back! Received 10,000 profile views.
50 in One Add a total of 50 games to your collection.
Genocide 5,000 posts on the gamrConnect forums.
Social Butterfly 50 friends on gamrConnect.
Leaving Limbo 100 posts on the gamrConnect forums.
Making Friends 10 friends on gamrConnect.

Jo21 on 27 May 2009

kars said:

Jo21 said:
Cell have PPE too.

that does the syncing with all the others SPUS.

They are not co processor they can stand on their own but they depend, on DMA calls to get data flowing.

PPE only sync, but SPU can be working on background preparing to process data or helping the SPU.

SPU have less cache, than PPE thats the main difference other being require DMA calls to move data.

View Post

Deneidez

Currently Offline

1,266

997 posts since 01/07/08

Recent Badges:

9 Years Has been a VGChartz member for over 9 years.
8 Years Has been a VGChartz member for over 8 years.
Open For Business Earned 10 badges.
A Badge Within A Badge Earned 20 badges.
4 Years Has been a VGChartz member for over 4 years.
'Ello Princess! Awarded for signing up.

Deneidez on 27 May 2009

alephnull said:

Here is a repost of what I posted earlier. You may actually find some use out of this in your coding (I know you said you develope programs for older machines, but there aren't any instructions here that aren't SSE1 -- I think -- so they should run on a P3, if not you can still convert this to MMX).

The issue is data parallelism (SIMD) not instruction parallelism (threads)

The reason people have difficultly getting decent performance out of the CBE is that compilers usually just give up when they are presented with a branch in an inner loop. Eg. the compiler has no problem vectorizing this

float a[N], b[N], c[N];
for (i = 0; i < N; i++)
a = b + c;

Hmm...

a = b + c; // also you didn't declare i :)

But, every compiler I know of will just give up on this

for (i = 0; i < N; i++)
   if (a > 0)
     a = b / c;

However, if the programmer knows what he is doing he can eliminate the branch by via logical operations:

__m128 *av = (__m128*)a;
__m128 *bv = (__m128*)b;
__m128 *cv = (__m128*)c;
__m128 zeros = _mm_setzero_ps();
for (i = 0; i < N/4; i++)
{
   __m128 x = _mm_div_ps(bv, cv);
   __m128 g = _mm_cmplt_ps(av, zeros);
   __m128 y = _mm_andnot_ps(g, av);
   __m128 z = _mm_and_ps(g, x);
   av = _mm_or_ps(y, z);
}

Now, on most intel machines you are highly constrained by register pressure because the machines only have 8 simd registers making vectorization impractical for large loops without heavy amounts of loop fission which may not be possible. The SPEs on the other hand have 128 SIMD registers per-core which for the applications I've developed, register pressure was a non-factor for traditional loop vectorization techniques. And the stuff I've been working on atm is quite branchy.

(PS there may be some errors as I did this quickly, but the general principle is correct)

That looks a bit nonefficient... You always do the dividing no matter what. Even if all in array are smaller than zero. However I know that CELL has just about always flops to spare vs instructions(- memory use in this case variables in arrays b and c). I am wondering whats the limit when a>0 comes more efficient. How about this one?

float a[N],b[N],c[N];

for(int i = 0;i<N;i++)

{

if(a<0)

    for(int j = i;j<N;j++)

      a+=b[j]/c[j];

}

New crap available on mi pages!

View Post

Around the Network

nen-suer

Currently Offline

View Profile

View Posts

Games Collection

45,985

8408 posts since 11/01/09

Recent Badges:

Leaving Limbo 100 posts on the gamrConnect forums.

Watch Your Back! Received 10,000 profile views.

Spreading the Disease Score a total of 50 games in your collection.

9 Years Has been a VGChartz member for over 9 years.

50 in One Add a total of 50 games to your collection.

Everything's Falling Into Place Add a total of 100 games to your collection.

Currently Playing:

Grand Theft Auto V (PS3)

Dragon's Crown (PSV)

Grand Knights History (PSP)

nen-suer on 27 May 2009

dahuman said:

nen-suer said:
@dahuman

Cell is real, while sayians aren't XD

I still pwned your imageZ!!

Okay....lets see what happend after that.....

Vote to Localize — SEGA and Konami Polls

Vote Today To Help Get A Konami & SEGA Game Localized.This Will Only Work If Lots Of People Vote.

Click on the Image to Head to the Voting Page (A vote for Yakuza is a vote to save gaming)

View Post

SubiyaCryolite

Currently Offline

View Profile

View Posts

Games Collection

13,516

3194 posts since 31/08/08

Recent Badges:

Site Veteran Has been a VGChartz member for over 5 years.

Pata 100 wall post comments made on gamrConnect.

4 Years Has been a VGChartz member for over 4 years.

Hit And Run 15 comments posted on VGChartz news articles.

7 Years Has been a VGChartz member for over 7 years.

Spreading the Disease Score a total of 50 games in your collection.

Currently Playing:

Uncharted 2: Among Thieves (PS3)

Deus Ex: Human Revolution (PC)

Borderlands 2 (PC)

Just Cause 2 (PC)

SubiyaCryolite on 27 May 2009

Its over 9000!!

I predict that the Wii U will sell a total of 18 million units in its lifetime.

The NX will be a 900p machine

View Post

kars

Currently Offline

View Profile

View Posts

Games Collection

265

215 posts since 04/03/07

Recent Badges:

11 Years Has been a VGChartz member for over 11 years.

13 Years Has been a VGChartz member for over 13 years.

16 Years Has been a VGChartz member for over 16 years.

10 Years Has been a VGChartz member for over 10 years.

4 Years Has been a VGChartz member for over 4 years.

17 Years Has been a VGChartz member for over 17 years.

kars on 27 May 2009

Jo21 said:

PPE only sync, but SPU can be working on background preparing to process data or helping the SPU.

SPU have less cache, than PPE thats the main difference other being require DMA calls to move data.

Not quite. The SPUs do not have a cache, they depend on their local memory that has to hold the programming code and the Data and only the Programmer is responsible for the management of this memory. The important thing of these units is that they can simultaniously send their old results and receive new data (via their own Memory Flow Controller) and calculate the current data. In theory all units could work continously but in such a situation 3 SPE (SPU+MFC) could block the bus (if they do not form a chain). Additionaly the PPE can execute two orders at the same time, the SPUs can only execute one order, but every SPU has an AltiVec 128 Engine but only one of the execution pipelines of the PPE has such a unit. There is one of the biggest differences to the Xenon which has two AltiVec 128 Units for each core (one per pipe).

View Post

Deneidez

Currently Offline

View Profile

View Posts

Games Collection

1,266

997 posts since 01/07/08

Recent Badges:

Quite a Comeback Enter your first Prediction League event.

3 Years Has been a VGChartz member for over 3 years.

7 Years Has been a VGChartz member for over 7 years.

Mirror Image Awarded for uploading an avatar.

Leaving Limbo 100 posts on the gamrConnect forums.

A Badge Within A Badge Earned 20 badges.

Deneidez on 27 May 2009

kars said:

Jo21 said:

PPE only sync, but SPU can be working on background preparing to process data or helping the SPU.

SPU have less cache, than PPE thats the main difference other being require DMA calls to move data.

Not quite. The SPUs do not have a cache, they depend on their local memory that has to hold the programming code and the Data and only the Programmer is responsible for the management of this memory. The important thing of these units is that they can simultaniously send their old results and receive new data (via their own Memory Flow Controller) and calculate the current data. In theory all units could work continously but in such a situation 3 SPE (SPU+MFC) could block the bus (if they do not form a chain). Additionaly the PPE can execute two orders at the same time, the SPUs can only execute one order, but every SPU has an AltiVec 128 Engine but only one of the execution pipelines of the PPE has such a unit. There is one of the biggest differences to the Xenon which has two AltiVec 128 Units for each core (one per pipe).

They do have cache. Very small, but its there. :)

It appears fairly simple each SPU had 512 bytes of cache (yes contrary to what you might have heard SPU do have a tiny bit of cache).

http://forum.beyond3d.com/showthread.php?t=41508

New crap available on mi pages!

View Post

dahuman

Currently Offline

View Profile

View Posts

Games Collection

38,514

9881 posts since 18/01/09

Recent Badges:

10 Years Has been a VGChartz member for over 10 years.

First Rung Of The Ladder Earned 10,000 gamrPoints

12 Years Has been a VGChartz member for over 12 years.

It's a Start Bank a Total of 2,000 VG$.

A Badge Within A Badge Earned 20 badges.

Leaving Limbo 100 posts on the gamrConnect forums.

dahuman on 27 May 2009

nen-suer said:

dahuman said:

nen-suer said:
@dahuman

Cell is real, while sayians aren't XD

I still pwned your imageZ!!

Okay....lets see what happend after that.....

we both know how it ends so what's the point? =P

View Post

< Prev 1 9 10 11 12 13 14 Next >