By using this site, you agree to our Privacy Policy and our Terms of Use. Close
EpicRandy said:

Benchmarks have a high tendency of portraying the performance of a GPU in the context of their own test and lose accuracy when trying to portray anything else either.

Correct.

EpicRandy said:

Tflops are not meant to be used to evaluate the FPS performance of a GPU, so it's disingenuous to use the figure in this context solely and say it's a bullshit figure. It always depicts accurately the performance capacity of the stream processors themselves but not the whole GPU. for this, you have to take everything into account.

No. It doesn't accurately depict the performance capacity of the stream processors.
Teraflops is single precision floating point.

I have already provided the evidence on this, that the advertised Teraflops doesn't correspond with real-world floating point performance in tasks.
See here with the Geforce 2060 @5.2 Teraflops doubling the floating point performance of the Radeon RX 580 @5.8 Teraflops.
https://www.anandtech.com/bench/GPU19/2703

It doesn't tell us the capabilities of the Stream processors INT16 performance.
It doesn't tell us the capabilities of the Stream processors INT8 performance.
It doesn't tell us the capabilities of the Stream processors INT4 performance.
It doesn't tell us the capabilities of the Stream processors FP8 performance.
It doesn't tell us the capabilities of the Stream processors FP16 performance.
It doesn't tell us the capabilities of the Stream processors FP64 performance.

You do realise the Stream processors do more than just single precision FP32, right? right?
Things like rapid packed math is a thing as well.
https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/4

You need to stop arguing against the evidence.

EpicRandy said:

Benchmarks cannot be used when designing new GPUs/architecture, they have to rely on metrics and sets targets from every one of these, and tFlops is 1, time spy score isn't.

That isn't how CPU's and GPU's are designed.

They design them in such a way to have "projected" performance for different benchmarks.

AMD, nVidia and Intel will also take past historical performance uplift trends in current benchmarks to project their future performance of new hardware to see how they will compete.

EpicRandy said:

Benchmarks cannot be used when designing new GPUs/architecture, they have to rely on metrics and sets targets from every one of these, and tFlops is 1, time spy score isn't.

Tflops will depict things accurately as long as you run workloads that have no, or are designed to avoid bottlenecks when possible. That's why supercomputers use this figure predominantly.

750 hp car beat 1500hp one.

No, supercomputers use the figure as an advertisement tool.

Teraflops. Aka. Single Precision Floating Point. Aka. FP32 would not be used... At all in a super computer that is only doing INT4 or INT8 A.I inference calculations... And this is actually a growing and common thing, where a super computer doesn't need any FP32 capability, making Teraflops a useless metric.

https://developer.nvidia.com/blog/int4-for-ai-inference/

EpicRandy said:

Tflops will depict things accurately as long as you run workloads that have no, or are designed to avoid bottlenecks when possible. That's why supercomputers use this figure predominantly.

Yes, you would if the ddr4 starved the 1030 while the GDDR5 allowed for more consistent utilization of the stream processor.

I think you just admitted that Teraflops alone is bullshit, because you are starting to recognize other aspects.

Took awhile, but we are getting there.

EpicRandy said:

they are generally 2 flops per cycle. They can be used for other operations and the performance of those will also be listed alongside the tFlops figure. combined operations are also listed with different tFlops figures like fp16 or fp64, Other optimizations can be pre-done through the compiler when the software is built and so the GPU would be agnostic of these.

Except the advertised Teraflops doesn't account for FP16 and FP64. - When Teraflops is used by itself it's FP32/Single Precision.

EpicRandy said:

A GPU designed with a stream processor with 2 flops/ cycle is not 2 operations, it is 2 operations using floats, when they process something else like double they will use many cycles to process the task. Some stream processors are designed so that they can use both 32-bit to process a single double (fp64), those will be listed with half performance on double. others are not and can take up to 16 cycles to do the same operations. that's dependent on the architecture. Some stream processors are limited to multiplication for 1 of its 32-bit operations and addition/subtraction for the other.

Floats are an operation.

Here is the thing, packing math together -only- works if the operation is identical, thus Half-Precision and Double-Precision is -never- going to be a linear increase/decrease in the real world due to those inherent inefficiencies.

Again. Teraflops doesn't account for any of that, hence why it's bullshit.

EpicRandy said:

No, you cannot, if you design your stream processor with different ALUs that can do 4flops/cycle like rdna3 or even 8 like some have done in the past, this will already be taken into consideration with the tFlops figure. like the 7900 xtx, its tFlops is Shader Core * clocks * 4 instead of 2. So you won't be able to exceed this value. You could offload some computing with other hardware accelerated parts but the tFlops figure is not meant to measure those, only the stream processors.

A little bit more complex than that I am afraid.

The 7900XTX having dual-issue ALU's, each can do 2 operations, means it is Shader Core*2 (Just like with VLIW) * Clock * 2 Operations per cycle.

Those ALU's can do Integer operations as well, Teraflops doesn't represent any of that, Teraflops tells us *nothing* except for a single type of operation a GPU does... And only theoretically.

EpicRandy said:

That's not because tFlops is bullshit, that's because the 7900 XTX utilization of its stream processor is bullshit with video games' typical workloads. Again tFlops are not meant to measure the performance of a whole GPU only the stream processor's max throughput.

No. It pretty much tells us Teraflops is bullshit.

EpicRandy said:

I know all that, but it does not really address the point. As binning is already done by the manufacturer and has been sorted and used in different SKUs that fit their respective tolerance the leeway you end up having as a customer to do undervolting and overclocking to get to the same TDP is marginal. And also the efficiency curve does exponentially increase the TDP with clocks passing a certain clock speed it does not really matter if you can get 100-200mhz of the most efficient target because you got lucky with the binning lottery. Skus that target consoles can't rely on the top few % of dies or else yield would be terrible.

This is core clockrate and power scaling on the Radeon 6700XT.

Pretty much explains that increasing core clocks has an efficiency curve.



Now if you increase clock, but decrease voltage by 500mV you will have a net-gain in terms of power consumption, or stay the same.



--::{PC Gaming Master Race}::--