Pemalite said:
No. It doesn't accurately depict the performance capacity of the stream processors. |
You did not show me evidence that tflops is bullshits you've only shown me results that tflops are not even meant to represent.
Teraflops is single precision floating point.
No. TFLOPS of fp32 is for single, TFLOPS of fp16 is for half, and TFLOPS of fp64 is for double. They are all listed independently with every GPU and should be used accordingly with the workflow scenarios.
See here with the Geforce 2060 @5.2 Teraflops doubling the floating point performance of the Radeon RX 580 @5.8 Teraflops.
https://www.anandtech.com/bench/GPU19/2703
https://foldingathome.org/2013/03/06/fah-bench-fah-coreopenmm-based-benchmark-for-your-gpu/?lng=en-US "It measures the compute performance of GPUs for Folding@Home". So a test that's sole purpose is to measure the performance of specific workflows associated to Folding@Home soft.
However using the same link you provided, just changing the benchmark to Geekbench level set segmentation 256 :
| GPU | Score | Tflops |
| AMD Radeon RX 460 4GB | 3.1 | 2.15 |
| NVIDIA GeForce GTX 1050 Ti | 3.4 | 2.138 |
| NVIDIA GeForce GTX 960 | 3.79 | 2.413 |
| NVIDIA GeForce GTX 1650 | 3.8 | 2.984 |
| AMD Radeon R9 380 | 4.7 | 3.476 |
| NVIDIA GeForce GTX 1060 3GB | 5.65 | 3.935 |
| NVIDIA GeForce GTX 1650 Super | 5.7 | 4.416 |
| NVIDIA GeForce GTX 1060 6GB | 6.15 | 4.375 |
| NVIDIA GeForce GTX 1660 | 6.19 | 5.027 |
| AMD Radeon RX 5500 XT 8GB | 6.7 | 5.196 |
| NVIDIA GeForce GTX 1660 Super | 6.72 | 5.027 |
| EVGA GTX 1660 Super SC Ultra | 6.8 | 5.153 |
| NVIDIA GeForce GTX 980 | 7 | 4.981 |
| AMD Radeon RX 570 | 7 | 5.095 |
| NVIDIA GeForce GTX 1660 Ti | 7.06 | 5.437 |
| AMD Radeon RX 580 | 7.2 | 6.175 |
| AMD Radeon R9 390X | 8.1 | 5.914 |
| NVIDIA GeForce RTX 2060 | 8.48 | 6.451 |
| AMD Radeon RX 590 | 9.1 | 7.119 |
| NVIDIA GeForce GTX 1070 | 9.2 | 6.463 |
| AMD Radeon RX 5600 XT | 9.8 | 7.188 |
| NVIDIA GeForce RTX 2060 Super | 9.9 | 7.181 |
| NVIDIA GeForce RTX 2070 | 10.1 | 7.465 |
| AMD Radeon RX 5700 | 10.7 | 7.949 |
| Sapphire Pulse 5600 XT | 10.8 | 8.063 |
| AMD Radeon RX Vega 56 | 11.3 | 10.5 |
| NVIDIA GeForce GTX 1080 | 11.5 | 8.873 |
| NVIDIA GeForce RTX 2070 Super | 11.7 | 9.062 |
| AMD Radeon RX 5700 XT | 12.6 | 9.754 |
| NVIDIA GeForce RTX 2080 | 12.8 | 10.07 |
| AMD Radeon RX Vega 64 | 13.1 | 12.7 |
| NVIDIA GeForce RTX 2080 Super | 13.9 | 11.15 |
| AMD Radeon VII | 17.2 | 13.44 |
| NVIDIA GeForce RTX 2080 Ti | 18.4 | 13.45 |
That's an incredibly strong relationship R20.96, with only 2 outliers in the Vega 56 & vega 64 (the same architecture and revision), if we removed these just to see what we get (right), the graph shows an even stronger relationship of R20.989. If the metrics were truly bullshit like you claim this should not be possible at all.
You do realise the Stream processors do more than just single precision FP32, right? right?
You do realize that even if stream processors can do int operations they have been created, designed, and optimized for large numbers of float operations, right? That's the reason they even exist. they generally do int it with added inefficiency compared to how the CPU handle them. GPUs exist because video games needed a better way to process large swat of floats operations required by graphics rendering.
fp32 TFlops is the most relevant figure when measuring stream processor capacity because that's what they are designed to compute predominantly, but this figure cannot assess for any bottleneck for the rest of the pipelines.
Pemalite said:
That isn't how CPU's and GPU's are designed. |
Even if they have a benchmark in mind during design this would only result in specific requirements targets, such as memory pool, bandwidth, and yes tFlops amongst others.
Pemalite said:
No, supercomputers use the figure as an advertisement tool. |
Of course, teraflops won't be used with supercomputers designed to process integers, that would be silly. But to claim, tflops is only an advertisement tool for others is simply wrong.
Pemalite said:
I think you just admitted that Teraflops alone is bullshit, because you are starting to recognize other aspects. |
What? I have claimed since my very first reply to your claim that tflops needed contextualization and repeated this many times since.
If you view HP as a bullshit metric too that's up to you but that's not my assessment whatsoever nor do I think there's much support for this amongst the car enthusiast community.
Pemalite said:
Except the advertised Teraflops doesn't account for FP16 and FP64. - When Teraflops is used by itself it's FP32/Single Precision. |
Yes, they account for fp16 and fp64 those are listed with every GPUs and they all use teraflops figure or a ratio over the fp32 one. How is that not accounting for those?
Pemalite said:
Floats are an operation. |
No float is a data type it's the same as single just with a different name and what you would use in C++ and many other languages.
again the performance of other floating datatypes are all listed with GPUs so it's only disingenuous to say tflops fp32 does not represent tflops fp16 or fp64 when those are listed as separate figures.
Pemalite said:
A little bit more complex than that I am afraid. |
Again the theoretical aspect of the figure is only to emphasize that you should not expect to max it out. It is the measurement of the max throughput of the stream processors for the datatype it represents.
If streams processor were designed for integers in any relevant capacity they would list performance with those separately just like :
Pemalite said:
No. It pretty much tells us Teraflops is bullshit. |
and again it's only bullshit because you want to apply the tflops figure to something it isn't meant to represent.
Pemalite said:
This is core clockrate and power scaling on the Radeon 6700XT. |
500mV would be one hell of an undervolt, never heard of such a drastic figure for any GPUs. Realistically you can expect from 25mV to 100mV. From my limited experience with this default mV typically range from 900 to 1200 mv so 500mV would be pretty insane. I've heard some more pronounced undervolt like 125mV and even 150mV but hard to say if they are any real or if the user properly ran benchmarks to assess stability.
Anyway, binning is used mainly to create different SKUs and/or rebranding old SKUs as the process matured and the average yield increases. But no one should expect the same gain as from a node shrink.
Last edited by EpicRandy - on 24 April 2023






