By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Norion said:

Easily conveying power difference to people is useful which teraflops can do in the right circumstances since while there is inaccuracy it does still give a general idea of the power gap between the two. The inaccuracy does still make it a bad metric though so what metric would you suggest be used instead?

Framerates tends to be a good one.

EpicRandy said:

Tflops is *not* actually a measurement of anything.

yes it is, it's not some kind of metric obtained with a dice roll when a new GPU enters the markets, each shader core (AMD) and CUDA core (Nvidia), which are responsible for the general floating operation on their respective GPU, can do up to 2 FLOPs per clock for 32-bit floating-point operations. That's the physical limits of those cores. Saying TFLOPS is not worth anything is akin to saying HP means nothing to cars, and wattage means nothing to electric motors. Every single metric of about everything simple or complex is only as good as you can contextualize it.

No. It's actually not.

It's a bunch of numbers multiplied together. - It's theoretical, not real world.

Again, no GPU or CPU will ever achieve their "hypothetical teraflops" in the real world.

EpicRandy said:

It's a theoretical number based on a number of hardware attributes and not a measurement of capability. - It is a number that is impossible to achieve in the real world.

Yes like I explained earlier it is theoretical cause you could never 100% task those cores. So it can be viewed both as a theoretical limit or a physical barrier you could never exceed or even attain at the reference clock, but it still is very much a measurement of capability.

But by that same vein, I could grab a 1 teraflop "rated" GPU and assert it's theoretically capable of "1.2 Teraflops" based on any number of factors.

It's meaningless, because it's unachievable in any real world scenario.

EpicRandy said:

A Radeon 5870 is 2.72 Teraflops GPU with 2GB @153GB/s of bandwidth.
A Radeon 7850 is 1.76 Teraflops GPU with 2GB Ram @153GB/s of bandwidth.

So the only real difference is almost 1 Teraflops of compute, right? It's accurate according to you right? So the Radeon 5870 should win right?

Then if it's such an accurate measure of compute, why is the 7850 faster in everything, including compute where in some single precision floating point tasks, the 7850 is sometimes more than twice as fast?
(But don't take my word for it)
https://www.anandtech.com/bench/product/1062?vs=1076

There's is another very significant difference between the 2. When I made my list of what could starve GPU cores 'insufficient memory pool, insufficient memory bandwidth, and insufficient power delivery' I did not mean it to be exhaustive. Here it's the flaws of Terascale architecture that starve the cores of the high-end 5870. No matter the gen, architecture, or revision the high-ends/enthusiasts segments are meant to push limits by sacrificing efficiency to get the last drops of performance, so should always be viewed with high diminishing returns in mind. The 7850 is a mid-range GPU using the better GCN architecture which resulted in significantly less starvation of its core.

What is starving the 5870 to have less real-world teraflops than the 7850?

Explain it. I'll wait.

EpicRandy said:

You are just confirming my point, that the number of CU's is not the be-all, end-all.

it's confirming my point too cause I never claimed the opposite either and using TFLOPS in certain scenarios does not mean I view it as a be-all, end-all either. In fact, all my statement points to carefulness when using Tflops, and using CU's would only be worse, so I don't know why you try to claim the opposite as my position. 

Using CU's alone is just as irrelevant as Teraflops. And I would never condone or support such a thing.

Both are bullshit.

EpicRandy said:

Nah. Isolating GPU power consumption doesn't result in higher GPU power consumption.

Remember binning is actually a thing and as a process matures you can obtain higher clockspeeds without a corresponding increase to power consumption, sometimes... You can achieve higher clocks -and- lower power consumption as processes mature.

I think you misunderstood what I was trying to say. The default TDP is for the whole APU so the 5500u having a less power-hungry CPU means the GPU has more available power hence the 200mhz higher clock frequency. The rdna2 (just for comparison as I did not find an equivalent chart for vega) architecture shows a 25% increase in W for 1800mhz compared to 1600mhz which is a 12.5% increase in performance. No doubt the Vega architecture is not as good as that either so the ratio may even be worse. Binning is a thing but the best bins would be reserved for the higher tier with better profit margins like 4800U and 4980U and have only a marginal impact nothing of the sort to bridge a 12.5% performance/watt gap.

And binning does not always end up being used for power saving, look at the RX 400s vs RX 500s, the difference was only better bins but they used it to get higher clocks (about 6%) since they pushed the architecture to the max it's actually resulted in worse performance/watt using 23% more W for the 580 vs 480.

Anyway, this whole conversation is weird cause all your points do not disprove the initial context in which I used the TFLOPS figure. I said that AMD/MS needs to attain the 4TFLOPS envelop of the RDNA2 architecture with mobile-like TDP which I pointed out to another RDNA2 (which I mistakenly write rdna3 in my previous post, sorry if it's the source of the debate) chip with close-matched TFLOPS, I did not claim their performance was equivalent or comparable based on this, it was supposed to mean that AMD already successfully reduce the TDP envelope of the RDNA2 architecture with the change from 7nm to 6nm. If I contextualized things more, the 680M is a max 50w TDP but boasts 80%+ of its performance at 25W, see benchmark here. This is promising when you consider the semi-custom design of the Series S is even more efficient by using more cores at lower clock speeds. So it only adds to the plausibility of the video in the OP. RDNA2 at 4nm or even 3nm should be more than enough to push a 4TFLOPS rdna2 package under 25w and even have a shot at a 15w APU target.

TFLOPS is also very useful in this context because consoles must keep this metric when doing a die shrink. Look at the PS5, it now uses the 6nm Oberon plus process and shaved off 20w to 30w but kept the same TFLOPS target (the same clock speed and the same number of shader cores), same memory, same bandwidth, same everything but shrunk down. They have to do it this way to keep changes invisible to developers, and that's basically what I anticipate Xbox to do with the series consoles whether or not they want a revised S in a switch-like format. If MS were to use a different architecture like RDNA 3 or 4 it is unlikely they would use a different TFLOPS target either if they want to keep things invisible to devs (it may not even be possible here but the more you keep the same the easier it should be for a dev to create a new build from the S version) 

My argument is that TFLOPS is bullshit in using it to determine the capability of a GPU.

Nor are higher clocks always a detriment to power consumption... It's a balancing act as all CPU's and GPU's have an efficiency curve.

For example... You can buy a CPU, unvolt it... Then overclock it... And result in a CPU that uses less power, but offers higher performance due to it's higher clockrate.



--::{PC Gaming Master Race}::--