| Pemalite said:
For one, those cards weren't PCI-E to begin with (There is a big difference between those interconnects and not just in relation to the total GB/s either!). You keep parroting that, but that's not what I am seeing on my end on a consumer processor. (And I reiterate, it's essentially the fastest consumer-grade CPU money can buy.) One of my GPU's in my PC is faster than Two PS4 GPU's and a single 8-core Jaguar CPU and I have three of those GPU's. The reality is nVidia and AMD have very similar feature sets when abstracted, they go about implementing specific features very differently but the end result is the same, then AMD and nVidia build an entire range of GPU's from that architectural feature set which is identical across the board minus varying amounts of memories and compute engines and other such things. To put it in perspective though, nVidia and AMD's drivers have more lines of code than even the Windows Kernel, they're incredibly complex pieces of software in all respects, this is in order to squeeze out maximum performance and image quality whilst retaining complete backwards compatability with decades worth of software and games. |
GDDR5 latency is 20% higher. Yes, performance hit. But try that, Using a 50/50 split on bandwidth between CPU and GPU on GDDR5, you got almost 90 GB/s of maximum GPU memory read performance. You read and use. On PC, you will have to read DDR3, pass it by PCI-E and write on GDDR5. Note that I'm actually using memory on both sides and that is what I have to do. Now you are dealing with DDR3 latency + GDDR5 latency. My bottleneck is 5 GB/s on PCI-E. Man, PS4 is almost 20 X faster on this! But come on, 20% is really to much. What I'm actually saying is that a PS4 is better than your PC and better than every other home PC around there. Simply because it was architectured to have a lot of memory bandwidth planning about living in a world where it's limited.
Now about physics. 4 GPUs, that's nice! Now split work between them. Pass data between them, loading everything on CPU RAM and passing to other GPU. 4 GPUs will give you a speedup of, in a good case, 2 X, unless you run a optimized and heavily parallel algorithm. All of that just shows that you don't understand GPU programming. First, raw teraflops aren't a good measure. The entire architecture must allow all that power to be used. Take Seti@Home as an example. It has a high count of raw floating point calculation power, but it will lost for much smaller clusters using Infiniband because of comunication overhead. And that is the problem with parallel computing, comunication costs a lot. And a GPU can be treated as a single cluster (except for being a SIMD machine while regular clusters are MIMD). And please, don't say your GPU is faster than 2 PS4 GPUs and a CPU. You can't compare a CPU and a GPU like that. MIMD x SIMD. A MIMD CPU can execute different instructions at the same time while a SIMD GPU only executes the same instruction on all cores with different data. A MIMD machine can do everything a SIMD machine does, while the opposite isn't true. If I run an algorithm with a lot of conditional statements (AI for example) or a recursive one, your GPU won't beat even the Pentium 4 of the article I passed.
You still refuses to see the memory wall. I sent you a link for a paper of computing conferences. I sent you a link to a paper wrote by John Von Neumann itself. If he can't convince you, no one more can. I will say again, do you really think that your non-scientific observations of your own computer usage are a better source of info than a research from Von Neumann? I'm not talking about running simpe tasks. I'm assuming that a group of developers will sit down and say "let's use every single core, every drop of power of the GPU, offload tasks as possible, optimize access to minimize cache misses, use well the bandwidth and extract every single drop of power here". Thats what gives you power. Balancing load to find a point where you maximize the usage of everything on that environment. And yes, it means even using cache correctly to avoid misses. And for that, you need to know exactly what is under the hood. Every spec. That's why supercomputers are homogeneous clusters, because knowing exactly the balance of power between nodes allows you to make the right decisions.
Just to end, real time raytracing is planned by NVidia in around 5 years, after 2 new iterations of GPUs. 3 or 4 console generations are 20 years, that's crazy. They have a double effort running in parallel, their GPU and tools (OptiX) improving and newer and more optimized algorithms do do faster ray tracing (using more optimized tree data structures, etc). Actually, when Sony and MS asked developer of what they wanted for a new console, they were asked to not use a ray tracing capable GPU, because all engines would have to be redone from scratch to use it. Of course, that was a little exagerations since not even a Titan could do it at realtime, but current GPUs are pretty close of doing that, so I think they just want to be sure of it.








