By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:
torok said:

You are not understading the problems with GPU intensive processing and the problem in the bottleneck. Currently the single massive bottleneck is on passing data from CPU to GPU and all currently uses of GPU calculations are the ones that aren't affected by this problem. As you said "Some datasets aren't terribly bandwidth or latency sensitive, some are". Thats sums up all the point of discussion.


You're missing the point completely. There is no "Bottleneck" it's a falacy dreamed up by console gamers who believe in the advertising and hype by their respective companies.
There is a reason why PC's use DDR3 Ram as system Ram and GPU's have GDDR5 Ram.

System Ram is typically 20% give or take lower latency (In overall clock cycles) than GDDR5 memory, this helps massively when there is a stall on the CPU.
GPU's however want bandwidth above all else, latency be damned, GDDR5 is perfect for this.

Grab some DDR3 1600mhz memory, that's 800mhz IO, which has a typical CAS latency of 8, that means it has a latency of 10ns.
Grab some DDR2 800mhz memory, that's 400mhz IO, which has a typical CAS latency of 4, this is also 10ns.

Now with GDDR5 the data rates are 4x faster than the IO clock instead of 2x, I.E. 5ghz GDDR5 is 1.25ghz x4 and would have a CAS Latency of 15.
15/(1.25 GHz) = 12 ns

So the latency of GDDR5 is 20% higher than DDR3, that's a big difference when the CPU doesn't have the data it requires in it's caches and the predicters weren't able to predict the data required for processing ahead of time, we are talking millions/billions of compute cycles here essentially going to waste.

torok said:

Don't assume to that all massive parallel operations are easy to run on a GPU. Conditional statements or recursive algorithm destroy GPU calculation performance and it's not easy to remove this problems. So we usually deal with more complex algorithms on GPU and still having to worry about distributing your data set is far from a nice experience. That even account for physics, SPH being a good example of problems with CPU-GPU data transfer (http://chihara.naist.jp/people/2003/takasi-a/research/short_paper.pdf, but newer resultas from NVidia are actually looking good now). 

That was done on a Geforce FX.
For one, those cards weren't PCI-E to begin with (There is a big difference between those interconnects and not just in relation to the total GB/s either!).
Secondly... They only used a single core processor and a crap one at that.
Even the Geforce 6800 and 7900 used a PCI-E to AGP bridge chip to enable PCI-E compatibility, with the downside of only maxing out at AGP 8x speeds. - Which is orders of magnitude slower than what we have today.
GPU's are also far more flexible today, The Geforce FX, 6 and 7 series weren't even designed for dedicated compute tasks in mind, I should know I helped write some shaders for Oblivion and Fallout in order to achieve better performance on the FX, Geforce 3 and 4 cards.


torok said:

Don't believe in the memory wall problem if you prefer, even if it is basically accepted as a fact in all the parallel/masivelly parallel computing community. And that's what we have with 8 or 16 cores. These link: http://storagemojo.com/2008/12/08/many-cores-hit-the-memory-wall/ is pretty great and shows some nice points, even with cases of 16 core processors losing to 8 core ones in operations. There is a paper from John Von Neumann there pointing the problem, and that was in 1945. Is Von Neumann is wrong about it? Not much likely. You point for the use on traditional desktops, where normally the CPU isn't being heavily taxed. And when it is normally the answer is Turbo Boost and that disable cores to rise the clock of others, wich avoids the memory wall problem. I'm talking here about games using all of the cores to do intensive operations and that will hit the bottleneck faster than anything. 


You keep parroting that, but that's not what I am seeing on my end on a consumer processor. (And I reiterate, it's essentially the fastest consumer-grade CPU money can buy.)
I can tax all my cores for something like Folding@Home or Bitcoin mining or Seti, I see significant gains when enabling more cores, there is no wall.
I also don't have turbo enabled, all 6 cores and 12 threads run at 4.8ghz as the nominal clock, with allowances for lower when not being utilised to conserve on power.


torok said:


And of course a PS4 can't do "Battlefield 4 in Eyefinity at 7680x1440 with everything on Ultra and achieve 60fps" since it doesn't have the required raw power to rasterize all that pixels. More GPUs? Good luck splitting work between them without passing data. But PS4 will far exceed in physics calculation using both GPU and CPU to workload the task. And even in 1080p, it will look way better. And forgot BF4 now, since it's a unoptimized launch game and probably just a por of the PC version to grab money from people.

Confirmed: PS4 is inferior to my PC.
One of my GPU's in my PC is faster than Two PS4 GPU's and a single 8-core Jaguar CPU and I have three of those GPU's.
I could dedicate two of those GPU's to Physics if a developer allowed me to do so, that's almost 9 Teraflops right there, the PS4 would literally scream "I'm a teapot!" in an attempt to process that much data.
And next month my rebuild will be done and I will have four Radeon R9 290X's under water, which would put me at about 12x the power of a PS4 with just my GPU's alone and twice the amount of GDDR5 (Which will also be faster) memory and 8x the system memory.

I also don't have to split work up.
AMD'sdrivers are actually incredibly good at handling that task all by itself even when I give it a generic compute job.


torok said:

 


No, there isn't any good and real alternative to CUDA. Even CUDA currently sucks. We don't need more alternatives, we need a unified one that runs well on all GPUs and has good developer tools. All the decent ones are NVidia only. AMD needs to up their game here. About Mantle, it's largely PR talk. Coming from AMD that has a terrible background in software tools it is even worse. All GPUs around here are totally different beasts, it's not easy to optimize code for them. Of course it will bring some improvement, but will be far from a game changer. If it was that easy, NVidia would already have it. In the research world, AMD basically never, I mean never, brings nothing new to the table. NVidia brought a lot of massive techs over the years, CUDA is currently the king in GPU computing and Optics is almost bringing real-time raytracing for us. That last one, is THE game changer for the next decade of graphics processing.

Of course a developer would not bother wasting their time optimising their game for every single GPU on the market, that would be asanine to even suggest such an endeaver, however...
The reality is nVidia and AMD have very similar feature sets when abstracted, they go about implementing specific features very differently but the end result is the same, then AMD and nVidia build an entire range of GPU's from that architectural feature set which is identical across the board minus varying amounts of memories and compute engines and other such things.

Then you have the API, there are several types of API's such as High-Level and Low-Level API's, it's the same for consoles too.
The Low level API's are closer to the metal and are incredibly efficient, however they are also harder to build a game for.
A high Level API is very easy to make a game for, but you sacrifice (obviously) speed.
Both interface with a driver and the driver interfaces with the hardware.

Historically the PC has only had High-Level API's since 3dfx's GLIDE API, consoles have a choice of both.
Battlefield 4 for instance uses a High-Level API on the Playstation 4, hence why it does not run at full 1080P with Ultra settings.

On the PC however, AMD has reintroduced the Low-level API in the forum of Mantle, which initially is only going to be for it's Graphics Core Next Architecture, of course it's open source so nVidia can adapt it's drivers to it too.

To put it in perspective though, nVidia and AMD's drivers have more lines of code than even the Windows Kernel, they're incredibly complex pieces of software in all respects, this is in order to squeeze out maximum performance and image quality whilst retaining complete backwards compatability with decades worth of software and games.

Mantle however will also reduce the Draw Cell overhead, AMD stated that even with an AMD FX underclocked to 2Ghz, that the Radeon R9 290X is still GPU bound.
Draw Cells account for a stupidly massive amount of a games current CPU usage.

Also Real-time Ray tracing isn't going to be here for a long long time, maybe in 3-4 console generations, heck movies like Lord of the Rings, Finding Nemo etc' uses a scanline renderer with photon mapping not ray tracing.

A mix of technologies is the best way to go about it.

Why do you even continue to argue someone who doesn't even know what their talking about. The only part of a physics simulation that is limited by or runs better on a CPU is collision detection (Mostly because the algorithms for it are biased towards sequential processing.) and he doesn't even get full credit for mentioning that.