By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming - Intel Larrabee finally hits 1TFLOPS - 2.7x nVidia GT200



During the recently held SC09 conference in Portland, Oregon - Intel finally managed to reach its original performance goal for Larrabee. Back in 2006, when we first got the first details about Larrabee, the performance goal was "1TFLOPS@ 16 cores, 2.0 GHz clock, 150W TDP". During Justin Rattner's keynote, Intel demonstrated the performance of LRB as it stands today.

At SGEMM Performance test [4K by 4K Matrix Multiply, QCD], Intel achieved 417 GFLOPS using half the cores on the prototype card, and reached 825 GFLOPS by enabling all the cores. While looking at the numbers alone, one might think that these scores are below the level of ATI Radeon 4850 and nVidia GeForce GTX 280/GTX 285. Of course, there is a "but" coming - unlike theoretical numbers that are usually disclosed by ATI and nVidia - this was an actual SGEMM benchmark calculation used in the HPC community.


Intel Larrabee reaches 1TFLOPS in SGEMM BLAS test, 4Kx4K matrix

The keynote continued while the engineers scrambled at the back to try to beat the 1TFLOPS barrier. A couple of minutes before the end of the keynote, Justin added the infamous "And one more thing…" Initial overclocked performance was 913 GFLOPS, moved slowly past 919 GLOPS, bounced up to 997 GFLOPS and ultimately passed the 1TFLOPS barrier with 1006 GFLOPS. Now, we can debate the numbers all we want, but the fact of the matter is that nVidia Tesla C1060 delivers only 370 GFLOPS in an identical SGEMM 4Kx4K calculation. Thus, Larrabee today comes at 2.7x math performance of GT200 chip.


In comparison, GT200-based Tesla card reaches 370 GFLOPS...

One might mention AMD GPU line-up being more efficient than nVidia one, but unfortunately the situation is rather complex due to interesting state of AMD GPGPU developments. AMD's architecture is very strong in theoretical performance and in real-world gaming. When it comes to GPGPU world, AMD ditched everything else to focus on OpenCL development and the results will come in 2010. But those efforts cannot accommodate for architectural limitations. As we disclosed on numerous occasions, AMD introduced the 1Fat+4Thin concept with the ATI Radeon 2900XT, pulling in a Core cluster consists out of one unit for transcendental operations and four units for Multiply-Add/Add/Integer-Add/Dot operations. Thus, the Radeon 4800 family comes with 160 cores comparable to nVidia 30 clusters with 8 fully-featured cores i.e. ATI's 160 vs. nVidia 240 cores.

Long story short, the real-world SGEMM performance of AMD's FireStream 9270 board [Radeon 4870] is 300 GFLOPS, weaker than GT200. We don't have information about SGEMM performance of Evergreen GPUs [5700, 5800, 5900 series] but as soon as we learn the numbers - we'll let you know. The same thing goes for nVidia's long-delayed NV100-based family of products.

But as of SC09, the top five performing products for SGEMM 4K x 4K are as follows [do note that multi-GPU products are excluded as they don't run SGEMM]:
1.  Intel Larrabee [LRB, 45nm] - 1006 GFLOPS
2.  EVGA GeForce GTX 285 FTW - 425 GFLOPS
3.  nVidia Tesla C1060 [GT200, 65nm] - 370 GFLOPS
4.  AMD FireStream 9270 [RV770, 55nm] - 300 GFLOPS
5.  IBM PowerXCell 8i [Cell, 65nm] - 164 GFLOPS

If you're wondering where products such as Intel Harpertown-based Core 2 Quad or Nehalem-based Core i7 stand, the answer is quite simple - i7 XE 975 at 3.33 GHz will give you 101 GFLOPS, while Core 2 Extreme QX9770 at 3.2 GHz gives out 91 GFLOPS. Regardless of how hard we tired, we weren't able to find performance of AMD CPUs while using 4K by 4K matrix.

As you can see for yourself, Larrabee is finally starting to produce some positive results. Even though the company had silicon for over a year and a half, the performance simply wasn't there and naturally, whenever a development hits a snag - you either give up or give it all you've got. After hearing that the "champions of Intel" moved from the CPU development into the Larrabee project, we can now say that Intel will deliver Larrabee at the price the company is ready to pay for. The fact that the design cost for Larrabee is probably as high as the combined R&D cost on GPU from nVidia and AMD combined in the past… 3 years, doesn't exactly play a role here. Intel has enough cash to deliver the part and not worry about TSMC's hiccup which only accelerated AMD's plans to move the GPU production away from TSMC [to GlobalFoundries] in 2011, leaving nVidia as the only major client.

There are several questions that are yet to be unveiled, such as efficiency of Tesla C2050/C2070 GPGPU cards. If nVidia raises the efficiency from current 40% to an expected 80-90%, Tesla chips should give out more than 1TFLOPS, but neither Larrabee nor NV100 are out the door yet.

Also, we wonder what the restructured memory infrastructure means for the GPGPU version of AMD Evergreen architecture. By a rough factor of 2x more compute power, Radeon 5870 / FireStream 9370 should give out 600 GFLOPS in SGEMM benchmark but we don't know if that number is correct.


http://brightsideofnews.com/news/2009/12/2/intel-larrabee-finally-hits-1tflops---27x-faster-than-nvidia-gt200!.aspx



Around the Network

how cheap will it be?



So this is good? I have no clue about stuff like this.



Imagine not having GamePass on your console...

It might be good, it really depends on how it is used and how much it costs. They compare it to a last generation 285 (the 385 should be out shortly with much more competitive numbers much like the 5870 blew away all the last gen processors when it came out) and it does a bit better, yes, but not really that much better.

I expect that the Larrabee will be matched roughly by the high end of Nvidia's next gen GPUs. Additionally they are comparing a theoretical pull out all the stops prototype with an actual consumer product that has price constraints (in the case of the 285 around 300 dollars). I'd like to see how a production 300 dollar Larrabee model does on benchmarks, I bet it would be nowhere near as high.




 PSN ID: ChosenOne feel free to add me

wow that's crazy.



Around the Network
Impulsivity said:
It might be good, it really depends on how it is used and how much it costs. They compare it to a last generation 285 (the 385 should be out shortly with much more competitive numbers much like the 5870 blew away all the last gen processors when it came out) and it does a bit better, yes, but not really that much better.

I expect that the Larrabee will be matched roughly by the high end of Nvidia's next gen GPUs. Additionally they are comparing a theoretical pull out all the stops prototype with an actual consumer product that has price constraints (in the case of the 285 around 300 dollars). I'd like to see how a production 300 dollar Larrabee model does on benchmarks, I bet it would be nowhere near as high.

here is the big point:

unlike theoretical numbers that are usually disclosed by ATI and nVidia - this was an actual SGEMM benchmark calculation used in the HPC community.

even if it was a prototype

The keynote continued while the engineers scrambled at the back to try to beat the 1TFLOPS barrier. A couple of minutes before the end of the keynote, Justin added the infamous "And one more thing…" Initial overclocked performance was 913 GFLOPS, moved slowly past 919 GLOPS, bounced up to 997 GFLOPS and ultimately passed the 1TFLOPS barrier with 1006 GFLOPS. Now, we can debate the numbers all we want, but the fact of the matter is that nVidia Tesla C1060 delivers only 370 GFLOPS in an identical SGEMM 4Kx4K calculation. Thus, Larrabee today comes at 2.7x math performance of GT200 chip.

 

 the test showed is overclocked, the fact that the Chip could handle that in overclock. now what the stable and direct benchmark will be once the Chip get's released would be neat to see what it will  benchmark at is down the road.

and what look's to be pretty darn impressive result's even if they are reduced



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.

It's good for GPGPU, but that it is a tiny market. In terms of gaming performance, which is what this board cares about, it will be a lot lower per mm^2 than a RV770 or GT200 since much of the die is used for x86 and because Intel's graphics engineers haven't had 10 years of being in the high-end market so they couldn't overtake them in one go. But the next iteration of Larrabee (this is Larrabee version 2 by the way, so I mean v3) could rival AMD/Nvidia.

Also, it depends when it's coming out. Its current competition is the Evergreen (5xxx) and GT200 parts; when it comes out that could be Fermi and Evergreen, or if mid-2010 then Fermi and Evergreen+1.

Note they did not compare it to the HD5xxx, which would be at about 2.7TFlops. Their architecture is different, yes, so it would probably come second in that ranking. But again, no indication of game performance.

Tl;dr - Performance in games, on retail parts and compared to launch-time competition, is what you should wait for.



Impulsivity said:
It might be good, it really depends on how it is used and how much it costs. They compare it to a last generation 285 (the 385 should be out shortly with much more competitive numbers much like the 5870 blew away all the last gen processors when it came out) and it does a bit better, yes, but not really that much better.

If shortly is April-May, then yes. That's what Nvidia told their board partners at the last conference.

Performance is not good. Early indications show a projected core clockspeed of about 600MHz on A3 rev (current A2 rev is 500MHz and unprofitably low yields). Assuming no performance increases, and the confirmed 512 shaders, that's about 7% faster than the GTX295. Let's be optimistic and say a 10% boost in IPC, even though all of the architecture changes shown will only help GPGPU. So 15-20% faster than the 5870, yet a die size bigger than a GTX285 and nearly double that of a 5870. It will have to launch at $400+ to compete factoring in higher 40nm costs, and won't threaten the 5870 on price or the 5970 on performance. Add that to TSMC's 40nm issues and Nvidia's extreme lateness in getting large-die 40nm parts yielding propely (AMD's 4770 was ready in May; the similarrly sized GT240 launched in November), it's not looking good.



Soleron said:
It's good for GPGPU, but that it is a tiny market. In terms of gaming performance, which is what this board cares about, it will be a lot lower per mm^2 than a RV770 or GT200 since much of the die is used for x86 and because Intel's graphics engineers haven't had 10 years of being in the high-end market so they couldn't overtake them in one go. But the next iteration of Larrabee (this is Larrabee version 2 by the way, so I mean v3) could rival AMD/Nvidia.

Also, it depends when it's coming out. Its current competition is the Evergreen (5xxx) and GT200 parts; when it comes out that could be Fermi and Evergreen, or if mid-2010 then Fermi and Evergreen+1.

Note they did not compare it to the HD5xxx, which would be at about 2.7TFlops. Their architecture is different, yes, so it would probably come second in that ranking. But again, no indication of game performance.

Tl;dr - Performance in games, on retail parts and compared to launch-time competition, is what you should wait for.

I do not think it's about direct replacement of GPU's , it's about Helping the dedicated GPU.

The way i see it is for a way to help the GPU get better result's, with more of the thing's that the CPU/GPU could do to offset the process's that could be better suited for the CPU/GPU doing those and to let the GPU do what it does best and that is DRAW.

yes the quality of the chip could indeed be used in embedded system's to lower the overall cost but I think the main goal is for it to overcome

some of the much needed memory wall problem's that plague current design's now.



I AM BOLO

100% lover "nothing else matter's" after that...

ps:

Proud psOne/2/3/p owner.  I survived Aplcalyps3 and all I got was this lousy Signature.

joeorc said:
Soleron said:
...

I do not think it's about direct replacement of GPU's , it's about Helping the dedicated GPU.

The way i see it is for a way to help the GPU get better result's, with more of the thing's that the CPU/GPU could do to offset the process's that could be better suited for the CPU/GPU doing those and to let the GPU do what it does best and that is DRAW.

yes the quality of the chip could indeed be used in embedded system's to lower the overall cost but I think the main goal is for it to overcome

some of the much needed memory wall problem's that plague current design's now.

Intel seem to be pushing it as a GPU replacement. Not a CPU replacement or a third processor. They're getting board partners like AMD/Nvidia have. They were even targeting next-gen consoles though delays and missinh performance targets stopped that.

The eventual goal, of both AMD and Intel, is to put the GPU on the CPU die then use each one for the tasks its suited for. Hence AMD's Fusion, and Intel's GPU die on CPU package.

Larrabee is certainly a high-end product, considering it needs its own card and will draw like 300W of power. For low-power and embedded, Intel is putting their G4x type graphics on the CPU Package (Clarkdale, Arrandale (Nehalem dual + GPU_ and Pinetrail (Atom+GPU)).