By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:


You're missing the point completely. There is no "Bottleneck" it's a falacy dreamed up by console gamers who believe in the advertising and hype by their respective companies.
There is a reason why PC's use DDR3 Ram as system Ram and GPU's have GDDR5 Ram.

System Ram is typically 20% give or take lower latency (In overall clock cycles) than GDDR5 memory, this helps massively when there is a stall on the CPU.
GPU's however want bandwidth above all else, latency be damned, GDDR5 is perfect for this.

Grab some DDR3 1600mhz memory, that's 800mhz IO, which has a typical CAS latency of 8, that means it has a latency of 10ns.
Grab some DDR2 800mhz memory, that's 400mhz IO, which has a typical CAS latency of 4, this is also 10ns.

Now with GDDR5 the data rates are 4x faster than the IO clock instead of 2x, I.E. 5ghz GDDR5 is 1.25ghz x4 and would have a CAS Latency of 15.
15/(1.25 GHz) = 12 ns

So the latency of GDDR5 is 20% higher than DDR3, that's a big difference when the CPU doesn't have the data it requires in it's caches and the predicters weren't able to predict the data required for processing ahead of time, we are talking millions/billions of compute cycles here essentially going to waste.

That was done on a Geforce FX.

For one, those cards weren't PCI-E to begin with (There is a big difference between those interconnects and not just in relation to the total GB/s either!).
Secondly... They only used a single core processor and a crap one at that.
Even the Geforce 6800 and 7900 used a PCI-E to AGP bridge chip to enable PCI-E compatibility, with the downside of only maxing out at AGP 8x speeds. - Which is orders of magnitude slower than what we have today.
GPU's are also far more flexible today, The Geforce FX, 6 and 7 series weren't even designed for dedicated compute tasks in mind, I should know I helped write some shaders for Oblivion and Fallout in order to achieve better performance on the FX, Geforce 3 and 4 cards.


You keep parroting that, but that's not what I am seeing on my end on a consumer processor. (And I reiterate, it's essentially the fastest consumer-grade CPU money can buy.)
I can tax all my cores for something like Folding@Home or Bitcoin mining or Seti, I see significant gains when enabling more cores, there is no wall.
I also don't have turbo enabled, all 6 cores and 12 threads run at 4.8ghz as the nominal clock, with allowances for lower when not being utilised to conserve on power.


Confirmed: PS4 is inferior to my PC.

One of my GPU's in my PC is faster than Two PS4 GPU's and a single 8-core Jaguar CPU and I have three of those GPU's.
I could dedicate two of those GPU's to Physics if a developer allowed me to do so, that's almost 9 Teraflops right there, the PS4 would literally scream "I'm a teapot!" in an attempt to process that much data.
And next month my rebuild will be done and I will have four Radeon R9 290X's under water, which would put me at about 12x the power of a PS4 with just my GPU's alone and twice the amount of GDDR5 (Which will also be faster) memory and 8x the system memory.

I also don't have to split work up.
AMD'sdrivers are actually incredibly good at handling that task all by itself even when I give it a generic compute job.


Of course a developer would not bother wasting their time optimising their game for every single GPU on the market, that would be asanine to even suggest such an endeaver, however...

The reality is nVidia and AMD have very similar feature sets when abstracted, they go about implementing specific features very differently but the end result is the same, then AMD and nVidia build an entire range of GPU's from that architectural feature set which is identical across the board minus varying amounts of memories and compute engines and other such things.

Then you have the API, there are several types of API's such as High-Level and Low-Level API's, it's the same for consoles too.
The Low level API's are closer to the metal and are incredibly efficient, however they are also harder to build a game for.
A high Level API is very easy to make a game for, but you sacrifice (obviously) speed.
Both interface with a driver and the driver interfaces with the hardware.

Historically the PC has only had High-Level API's since 3dfx's GLIDE API, consoles have a choice of both.
Battlefield 4 for instance uses a High-Level API on the Playstation 4, hence why it does not run at full 1080P with Ultra settings.

On the PC however, AMD has reintroduced the Low-level API in the forum of Mantle, which initially is only going to be for it's Graphics Core Next Architecture, of course it's open source so nVidia can adapt it's drivers to it too.

To put it in perspective though, nVidia and AMD's drivers have more lines of code than even the Windows Kernel, they're incredibly complex pieces of software in all respects, this is in order to squeeze out maximum performance and image quality whilst retaining complete backwards compatability with decades worth of software and games.

Mantle however will also reduce the Draw Cell overhead, AMD stated that even with an AMD FX underclocked to 2Ghz, that the Radeon R9 290X is still GPU bound.
Draw Cells account for a stupidly massive amount of a games current CPU usage.

Also Real-time Ray tracing isn't going to be here for a long long time, maybe in 3-4 console generations, heck movies like Lord of the Rings, Finding Nemo etc' uses a scanline renderer with photon mapping not ray tracing.

A mix of technologies is the best way to go about it.

 

GDDR5 latency is 20% higher. Yes, performance hit. But try that, Using a 50/50 split on bandwidth between CPU and GPU on GDDR5, you got almost 90 GB/s of maximum GPU memory read performance. You read and use. On PC, you will have to read DDR3, pass it by PCI-E and write on GDDR5. Note that I'm actually using memory on both sides and that is what I have to do. Now you are dealing with DDR3 latency + GDDR5 latency. My bottleneck is 5 GB/s on PCI-E. Man, PS4 is almost 20 X faster on this! But come on, 20% is really to much. What I'm actually saying is that a PS4 is better than your PC and better than every other home PC around there. Simply because it was architectured to have a lot of memory bandwidth planning about living in a world where it's limited.

Now about physics. 4 GPUs, that's nice! Now split work between them. Pass data between them, loading everything on CPU RAM and passing to other GPU. 4 GPUs will give you a speedup of, in a good case, 2 X, unless you run a optimized and heavily parallel algorithm. All of that just shows that you don't understand GPU programming. First, raw teraflops aren't a good measure. The entire architecture must allow all that power to be used. Take Seti@Home as an example. It has a high count of raw floating point calculation power, but it will lost for much smaller clusters using Infiniband because of comunication overhead. And that is the problem with parallel computing, comunication costs a lot. And a GPU can be treated as a single cluster (except for being a SIMD machine while regular clusters are MIMD). And please, don't say your GPU is faster than 2 PS4 GPUs and a CPU. You can't compare a CPU and a GPU like that. MIMD x SIMD. A MIMD CPU can execute different instructions at the same time while a SIMD GPU only executes the same instruction on all cores with different data. A MIMD machine can do everything a SIMD machine does, while the opposite isn't true. If I run an algorithm with a lot of conditional statements (AI for example) or a recursive one, your GPU won't beat even the Pentium 4 of the article I passed.

 You still refuses to see the memory wall. I sent you a link for a paper of computing conferences. I sent you a link to a paper wrote by John Von Neumann itself. If he can't convince you, no one more can. I will say again, do you really think that your non-scientific observations of your own computer usage are a better source of info than a research from Von Neumann? I'm not talking about running simpe tasks. I'm assuming that a group of developers will sit down and say "let's use every single core, every drop of power of the GPU, offload tasks as possible, optimize access to minimize cache misses, use well the bandwidth and extract every single drop of power here". Thats what gives you power. Balancing load to find a point where you maximize the usage of everything on that environment. And yes, it means even using cache correctly to avoid misses. And for that, you need to know exactly what is under the hood. Every spec. That's why supercomputers are homogeneous clusters, because knowing exactly the balance of power between nodes allows you to make the right decisions. 

Just to end, real time raytracing is planned by NVidia in around 5 years, after 2 new iterations of GPUs. 3 or 4 console generations are 20 years, that's crazy. They have a double effort running in parallel, their GPU and tools (OptiX) improving and newer and more optimized algorithms do do faster ray tracing (using more optimized tree data structures, etc). Actually, when Sony and MS asked developer of what they wanted for a new console, they were asked to not use a ray tracing capable GPU, because all engines would have to be redone from scratch to use it. Of course, that was a little exagerations since not even a Titan could do it at realtime, but current GPUs are pretty close of doing that, so I think they just want to be sure of it.