Linux: PS3s Cell is faster than i7 965 XE

hmmm for some reason my brain just decided to filter out all array indexing, lol.

EDIT: Ok for some reason the indexes are showing up when I hit edit, but not when I view the post... *sigh*.... to tired to fool with it.

1) I basically just took some code from a project I was working on a month ago, you don't need cell, this will give you a speedup on a p4.

2) It looks inefficient, but there's a few things to consider:

A) You are doing 4 divides in one instruction, 1/4th fewer loads of array a at least, 1/4th fewer increments of i, 1/4 fewer compare instuctions, etc.

B) The x86 architectures can run compares and divisions simultaneously.

C) There's no branching to give you a bajillion cycle stalls turning your expensive of 21st-century computing wonder into a NOP machine.

D) Remember all those divides are pipelined. There's no dependency, so it shouldn't stall.

Here's what that should look like btw... (quadword memory loads occur with every de-reference, and just as quickly as a word load -- as long as your array is 16-byte aligned)

__m128 *av = (__m128*)a;
__m128 *bv = (__m128*)b;
__m128 *cv = (__m128*)c;
__m128 zeros = _mm_setzero_ps();
for (i = 0; i < N/4; i++)
{
   __m128 x = _mm_div_ps(bv, cv);
   __m128 g = _mm_cmplt_ps(av, zeros);
   __m128 y = _mm_andnot_ps(g, av);
   __m128 z = _mm_and_ps(g, x);
   av = _mm_or_ps(y, z);
}

I will do your challenge in the morning.... need to sleep been working on thesis non-stop for the last 48 hours.

Existing User Log In

New User Registration

Sony - Linux: PS3s Cell is faster than i7 965 XE - View Post

Recent Badges: