By using this site, you agree to our Privacy Policy and our Terms of Use. Close
EpicRandy said:
Pemalite said:

Framerates tends to be a good one.

Well yeah, I agree, but there are still many flaws with this.

  1. FPS can only measure already available cards and serve no purpose in evaluating future ones.
  2. It still needs contextualization, what game was running, what was the target resolutions, what was all the post-processing effects, what API was used, and what engine was used.
  3. 2 GPUs may perform almost exactly the same on 1 title yet vastly differently on another.
  4. It is very susceptible to manipulation
    1. Nvidia, through their partner programs with devs, make certain they use sometimes unnecessary features or level of utilization of a feature to impact performance on AMD GPU e.g. the wither 3 use of x64 tessellation with Hairworks design to cripple performance on AMD GPU with no image fidelity gain (after x16).
    2. anandtech: "Let's start with the obvious. NVIDIA is more aggressive than AMD with trying to get review sites to use certain games and even make certain GPU comparisons."
    3. Back not too long ago Nvidia drivers revision had a tendency to decrease GPU performance while AMD revisions increased their performances over the lifetime of the GPUs. When new gens were announced Nvidia compared the latest drivers' performances of their old gen to the new gen showing skewed comparisons. 
    4. AMD was caught using blatantly wrong number when displaying FPS improvement gen over gen with the RX 7900 XTX
  5. When gaming, many kinds of workloads are computed all at once, some that have better utilization of the GPU than others, but the shown performance of the GPU in fps will only rise to that of the least performing one.
  6. For certain workloads, average FPS would literally be a trash figure while Tflops will depict things more accurately.

It's literally why benchmarks actually exist.

Tflops can never depict anything accurately.
I have already showcased how different GPU's perform lower even with more tflops.
I have already showcased how identical GPU's with the same Tflop can perform at half the speed.

So to assert that it can "depict things accurately" when you can double your performance with a part that has identical Tflops is disingenuous.

Not only that Teraflops only represent single precision floating point...

So for anything involving Quarter Precision, Half Precision, Double Precision, Integers, Geometry throughput, Texel/Pixel/Texture fillrate... Teraflops has no bearing. That's like a massive chunk of the GPU and stuff you know.

EpicRandy said:
Pemalite said:

No. It's actually not.

It's a bunch of numbers multiplied together. - It's theoretical, not real world.

Again, no GPU or CPU will ever achieve their "hypothetical teraflops" in the real world.

Those multiplication represent literally how to stream processor works, they can perform 2 flops per cycle by design.
That much is not theoretical, processors require a high and low signal aka clock cycle to operate and stream processors are mostly designed to run 2 FP32 instructions per clock. That's not a theory that's how they work. Some workloads such as scientific simulations, machine learning, and data analytics have better utilization and are sometimes close to 100%.

Again. It's theoretical, not real world.

Otherwise if we took the Geforce 1030 DDR4 and GDDR5 variants, we wouldn't have one that is almost twice as fast at the same Teraflops.

No, they are not 2 flops per cycle.
They can do 2 operations per cycle, very different. - Not all operations are the same as some can be packed together to form one operation... Or one operation may be split into many.

EpicRandy said:
Pemalite said:

But by that same vein, I could grab a 1 teraflop "rated" GPU and assert it's theoretically capable of "1.2 Teraflops" based on any number of factors.

It's meaningless, because it's unachievable in any real world scenario.

No, I literally wrote, "physical barrier you could never exceed or even attain". So unless you are able to overclock it by %20 that's not possible.

Some real-world scenarios get really close to 100%, just not gaming in general but it's already better with consoles due to static hardware and specific optimization.

Except you can exceed the theoretical Teraflop number by combining operations if you make your ALU's fat enough.

EpicRandy said:
Pemalite said:

What is starving the 5870 to have less real-world teraflops than the 7850?

Explain it. I'll wait.

Sure here's a very good read on the utilization of the TeraScale architecture: 

Utilization remains a big concern though, for both the SPUs and the SPs within them: not only must the compiler do its best to identify 5 independent datapoints for each VLIW thread, but so must 64 VLIW threads be packed together within each wavefront. Further, the 64 items in a wavefront should all execute against the same instruction; imagine a scenario wherein one thread executes against an entirely different instruction from the other 63! Opportunities for additional clock cycles & poor utilization thus abound and the compiler must do it’s best to schedule around them.

With 5 SPs in each SPU, attaining 100% utilization necessitates five datapoints per VLIW thread. That’s the best case; in the worst case an entire thread is comprised of just a single datapoint resulting in an abysmal 20% utilization as 4 SPs simply engage in idle chit-chat. Extremities aside, AMD noted an average utilization of 68% or 3.4 SPs per clock cycle. A diagram from AnandTech’s GCN preview article depicts this scenario, and it’s a good time to borrow it here:

The HD 6900 series would serve as the last of the flagship TeraScale GPUs, even as TeraScale based cards continued to release until October of 2013. As compute applications began to take center-stage for GPU acceleration, games too evolved. The next generation of graphics API’s such as DirectX 10 brought along complex shaders that made the VLIW-centric design of TeraScale ever more inefficient and impractically difficult to schedule for. The Radeon HD 7000 series would accordingly usher in the GCN architecture, TeraScale’s inevitable successor that would abandon VLIW and ILP entirely and in doing so cement AMD’s focus on GPU compute going forward.ly because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare.

Did you really just link to a blog? Either way, I have a very low-level understanding of Terascale and it's derivatives.

And ironically, AMD has introduced VLIW-like ideas into RDNA3 by introducing dual-issue ALU's... It's a cheap way to increase throughput.

Which is partially why going from the Radeon RX 6950 @ 19.3 Teraflops to the RX 7900 XTX @46 Teraflops hasn't resulted in more than double the performance, because Teraflops is bullshit.

EpicRandy said:

    Pemalite said:

    My argument is that TFLOPS is bullshit in using it to determine the capability of a GPU.

    Nor are higher clocks always a detriment to power consumption... It's a balancing act as all CPU's and GPU's have an efficiency curve.

    For example... You can buy a CPU, unvolt it... Then overclock it... And result in a CPU that uses less power, but offers higher performance due to it's higher clockrate.

    Yes, I know all that but the efficiency curve gets exponentially worse with clocks past a certain speed and GPUs default clocks are always already past that point, binning only has marginal impacts here. You're able to under-volt some GPUs only because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare.

    Binning has massive impacts, depending on the process.

    Some chips cannot handle higher currents... Otherwise they suffer from an issue known as "electromigration" which will destroy the silicon.

    Hence why Polaris through binning went from the RX 480 to RX 580, yes power draw increased with that jump, but only because they could get away with it.

    Consequently AMD and Intel use binning on all their CPU's... For example the Ryzen 5500 and 5600X are fundamentally the same chip, but through binning hit different performance/power targets. - And parts of the damaged L3 got lazered off.

    Heck we could even go back to Phenom where AMD would use the same Chip design for it's entire lineup from Dual-cores right up to Quad-Cores, some of the chips with a damaged core would have that core disabled, thankfully you could re-enable them by setting Automatic Core Calibration to Auto and re-enable it, sometimes you need to pump more volts or lower clocks.
    That's binning.

    DonFerrari said:

    I would say your analogy with cars already get the idea across.

    Of course one car having 1000hp and another car having 900hp doesn't mean much when there are several other design elements that will impact the performance of that car from simple 0-100km/h (0-60mph) to time to do a lap (which them can even be affected by the driver itself on exactly same car and conditions either putting 2 drivers to do lap or even same driver doing multiple laps they will be different time).

    So would that mean measuring HP as totally useless? Absolutely not =p

    Nah. It's not like horsepower at all.

    It's more like CC's on a engine... Aka. The air displacement.

    You can get lower CC engines, outperform higher CC engines based on a number of design factors.



    --::{PC Gaming Master Race}::--