By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:
EpicRandy said:

I get what you say and kind of good advice in general but it's a little more complicated than this. Tflops measure the limits in compute power a chip can have at corresponding clocks. It is actually very precise to determine the capacity of a chip in a nutshell. However, this is only 1 part of the story the others can all be summed up by what % you can actually use in any given scenario or in other words, how starved the chip actually is. It can be starved with an insufficient memory pool, insufficient memory bandwidth, and insufficient power delivery. 

No. No. No.

Tflops is *not* actually a measurement of anything.

It's a theoretical number based on a number of hardware attributes and not a measurement of capability. - It is a number that is impossible to achieve in the real world.

It is extremely imprecise, not precise.

For example...
A Radeon 5870 is 2.72 Teraflops GPU with 2GB @153GB/s of bandwidth.
A Radeon 7850 is 1.76 Teraflops GPU with 2GB Ram @153GB/s of bandwidth.

So the only real difference is almost 1 Teraflops of compute, right? It's accurate according to you right? So the Radeon 5870 should win right?

Then if it's such an accurate measure of compute, why is the 7850 faster in everything, including compute where in some single precision floating point tasks, the 7850 is sometimes more than twice as fast?
(But don't take my word for it)
https://www.anandtech.com/bench/product/1062?vs=1076


Again. Teraflops is absolute bullshit. It literally represents nothing.

EpicRandy said:

Another aspect to consider is the amount and variety of hardware acceleration you have that may result in software bypassing the utilization of CU in some scenarios where it would need to use them on a chip without such acceleration.

in the example you gave the 2700U is actually very starved by the lack of memory pool and bandwidth, The 4500U feature 50% more L2 cache and 2x L3 cache and supports higher memory frequency. The Cpu side is also more power-hungry with the 2700U than the 4500u leaving more leeway for the GPU on the 4500u to use the same 25W TDP.

You are just confirming my point, that the number of CU's is not the be-all, end-all.

EpicRandy said:

For the 5500u vs 4700u, the sole difference is that the Cpu side is less power hungry with the 5500U allowing for higher clocks on the GPU, but make no mistake if you were to isolate the GPU power consumption and compare them both the 4700U would actually be more efficient per watt. Even the more recent rdna2 is most efficient at around 1300 to 1400 MHZ according to this source. The Vega architecture however had a much lower most efficiency clock speed, I could not find a source for this but I remember at the time of the Vega announcement that AMD was using clocks of ~850MHZ in their presentation to portray efficiency increase compared with the older architecture. This was prior to the reveal of the Vega 56 and 64 however so it is possible that it was tested on engineering samples. This may have shifted with node shrinkage also, but could not find anything on this, still really doubt 1800mhz would be more efficient per watt than 1500mhz with the Vega architecture.

Nah. Isolating GPU power consumption doesn't result in higher GPU power consumption.

Remember binning is actually a thing and as a process matures you can obtain higher clockspeeds without a corresponding increase to power consumption, sometimes... You can achieve higher clocks -and- lower power consumption as processes mature.

And you are right, Vega was extremely efficient at lower clocks. - AMD used a lot less dark silicon to insulate parts of the chip to reduce power leakage to obtain higher clockrates.

Ok to clarify things a bit, your position is :

Teraflops is absolute bullshit. It literally represents nothing.

My position:

TFLOPS should be contextualized before use, as it's only as good as you can realistically task it.

Tflops is *not* actually a measurement of anything.

yes it is, it's not some kind of metric obtained with a dice roll when a new GPU enters the markets, each shader core (AMD) and CUDA core (Nvidia), which are responsible for the general floating operation on their respective GPU, can do up to 2 FLOPs per clock for 32-bit floating-point operations. That's the physical limits of those cores. Saying TFLOPS is not worth anything is akin to saying HP means nothing to cars, and wattage means nothing to electric motors. Every single metric of about everything simple or complex is only as good as you can contextualize it.

It's a theoretical number based on a number of hardware attributes and not a measurement of capability. - It is a number that is impossible to achieve in the real world.

Yes like I explained earlier it is theoretical cause you could never 100% task those cores. So it can be viewed both as a theoretical limit or a physical barrier you could never exceed or even attain at the reference clock, but it still is very much a measurement of capability.


A Radeon 5870 is 2.72 Teraflops GPU with 2GB @153GB/s of bandwidth.
A Radeon 7850 is 1.76 Teraflops GPU with 2GB Ram @153GB/s of bandwidth.

So the only real difference is almost 1 Teraflops of compute, right? It's accurate according to you right? So the Radeon 5870 should win right?

Then if it's such an accurate measure of compute, why is the 7850 faster in everything, including compute where in some single precision floating point tasks, the 7850 is sometimes more than twice as fast?
(But don't take my word for it)
https://www.anandtech.com/bench/product/1062?vs=1076

There's is another very significant difference between the 2. When I made my list of what could starve GPU cores 'insufficient memory pool, insufficient memory bandwidth, and insufficient power delivery' I did not mean it to be exhaustive. Here it's the flaws of Terascale architecture that starve the cores of the high-end 5870. No matter the gen, architecture, or revision the high-ends/enthusiasts segments are meant to push limits by sacrificing efficiency to get the last drops of performance, so should always be viewed with high diminishing returns in mind. The 7850 is a mid-range GPU using the better GCN architecture which resulted in significantly less starvation of its core.

You are just confirming my point, that the number of CU's is not the be-all, end-all.

it's confirming my point too cause I never claimed the opposite either and using TFLOPS in certain scenarios does not mean I view it as a be-all, end-all either. In fact, all my statement points to carefulness when using Tflops, and using CU's would only be worse, so I don't know why you try to claim the opposite as my position. 

Nah. Isolating GPU power consumption doesn't result in higher GPU power consumption.

Remember binning is actually a thing and as a process matures you can obtain higher clockspeeds without a corresponding increase to power consumption, sometimes... You can achieve higher clocks -and- lower power consumption as processes mature.

I think you misunderstood what I was trying to say. The default TDP is for the whole APU so the 5500u having a less power-hungry CPU means the GPU has more available power hence the 200mhz higher clock frequency. The rdna2 (just for comparison as I did not find an equivalent chart for vega) architecture shows a 25% increase in W for 1800mhz compared to 1600mhz which is a 12.5% increase in performance. No doubt the Vega architecture is not as good as that either so the ratio may even be worse. Binning is a thing but the best bins would be reserved for the higher tier with better profit margins like 4800U and 4980U and have only a marginal impact nothing of the sort to bridge a 12.5% performance/watt gap.

And binning does not always end up being used for power saving, look at the RX 400s vs RX 500s, the difference was only better bins but they used it to get higher clocks (about 6%) since they pushed the architecture to the max it's actually resulted in worse performance/watt using 23% more W for the 580 vs 480.

Anyway, this whole conversation is weird cause all your points do not disprove the initial context in which I used the TFLOPS figure. I said that AMD/MS needs to attain the 4TFLOPS envelop of the RDNA2 architecture with mobile-like TDP which I pointed out to another RDNA2 (which I mistakenly write rdna3 in my previous post, sorry if it's the source of the debate) chip with close-matched TFLOPS, I did not claim their performance was equivalent or comparable based on this, it was supposed to mean that AMD already successfully reduce the TDP envelope of the RDNA2 architecture with the change from 7nm to 6nm. If I contextualized things more, the 680M is a max 50w TDP but boasts 80%+ of its performance at 25W, see benchmark here. This is promising when you consider the semi-custom design of the Series S is even more efficient by using more cores at lower clock speeds. So it only adds to the plausibility of the video in the OP. RDNA2 at 4nm or even 3nm should be more than enough to push a 4TFLOPS rdna2 package under 25w and even have a shot at a 15w APU target.

TFLOPS is also very useful in this context because consoles must keep this metric when doing a die shrink. Look at the PS5, it now uses the 6nm Oberon plus process and shaved off 20w to 30w but kept the same TFLOPS target (the same clock speed and the same number of shader cores), same memory, same bandwidth, same everything but shrunk down. They have to do it this way to keep changes invisible to developers, and that's basically what I anticipate Xbox to do with the series consoles whether or not they want a revised S in a switch-like format. If MS were to use a different architecture like RDNA 3 or 4 it is unlikely they would use a different TFLOPS target either if they want to keep things invisible to devs (it may not even be possible here but the more you keep the same the easier it should be for a dev to create a new build from the S version)