By using this site, you agree to our Privacy Policy and our Terms of Use. Close
EpicRandy said:
Pemalite said:

Framerates tends to be a good one.

Well yeah, I agree, but there are still many flaws with this.

  1. FPS can only measure already available cards and serve no purpose in evaluating future ones.
  2. It still needs contextualization, what game was running, what was the target resolutions, what was all the post-processing effects, what API was used, and what engine was used.
  3. 2 GPUs may perform almost exactly the same on 1 title yet vastly differently on another.
  4. It is very susceptible to manipulation
    1. Nvidia, through their partner programs with devs, make certain they use sometimes unnecessary features or level of utilization of a feature to impact performance on AMD GPU e.g. the wither 3 use of x64 tessellation with Hairworks design to cripple performance on AMD GPU with no image fidelity gain (after x16).
    2. anandtech: "Let's start with the obvious. NVIDIA is more aggressive than AMD with trying to get review sites to use certain games and even make certain GPU comparisons."
    3. Back not too long ago Nvidia drivers revision had a tendency to decrease GPU performance while AMD revisions increased their performances over the lifetime of the GPUs. When new gens were announced Nvidia compared the latest drivers' performances of their old gen to the new gen showing skewed comparisons. 
    4. AMD was caught using blatantly wrong number when displaying FPS improvement gen over gen with the RX 7900 XTX
  5. When gaming, many kinds of workloads are computed all at once, some that have better utilization of the GPU than others, but the shown performance of the GPU in fps will only rise to that of the least performing one.
  6. For certain workloads, average FPS would literally be a trash figure while Tflops will depict things more accurately.
Pemalite said:

No. It's actually not.

It's a bunch of numbers multiplied together. - It's theoretical, not real world.

Again, no GPU or CPU will ever achieve their "hypothetical teraflops" in the real world.

Those multiplication represent literally how to stream processor works, they can perform 2 flops per cycle by design.
That much is not theoretical, processors require a high and low signal aka clock cycle to operate and stream processors are mostly designed to run 2 FP32 instructions per clock. That's not a theory that's how they work. Some workloads such as scientific simulations, machine learning, and data analytics have better utilization and are sometimes close to 100%.

Pemalite said:

But by that same vein, I could grab a 1 teraflop "rated" GPU and assert it's theoretically capable of "1.2 Teraflops" based on any number of factors.

It's meaningless, because it's unachievable in any real world scenario.

No, I literally wrote, "physical barrier you could never exceed or even attain". So unless you are able to overclock it by %20 that's not possible.

Some real-world scenarios get really close to 100%, just not gaming in general but it's already better with consoles due to static hardware and specific optimization.

Pemalite said:

What is starving the 5870 to have less real-world teraflops than the 7850?

Explain it. I'll wait.

Sure here's a very good read on the utilization of the TeraScale architecture: 

Utilization remains a big concern though, for both the SPUs and the SPs within them: not only must the compiler do its best to identify 5 independent datapoints for each VLIW thread, but so must 64 VLIW threads be packed together within each wavefront. Further, the 64 items in a wavefront should all execute against the same instruction; imagine a scenario wherein one thread executes against an entirely different instruction from the other 63! Opportunities for additional clock cycles & poor utilization thus abound and the compiler must do it’s best to schedule around them.

With 5 SPs in each SPU, attaining 100% utilization necessitates five datapoints per VLIW thread. That’s the best case; in the worst case an entire thread is comprised of just a single datapoint resulting in an abysmal 20% utilization as 4 SPs simply engage in idle chit-chat. Extremities aside, AMD noted an average utilization of 68% or 3.4 SPs per clock cycle. A diagram from AnandTech’s GCN preview article depicts this scenario, and it’s a good time to borrow it here:

The HD 6900 series would serve as the last of the flagship TeraScale GPUs, even as TeraScale based cards continued to release until October of 2013. As compute applications began to take center-stage for GPU acceleration, games too evolved. The next generation of graphics API’s such as DirectX 10 brought along complex shaders that made the VLIW-centric design of TeraScale ever more inefficient and impractically difficult to schedule for. The Radeon HD 7000 series would accordingly usher in the GCN architecture, TeraScale’s inevitable successor that would abandon VLIW and ILP entirely and in doing so cement AMD’s focus on GPU compute going forward.

Pemalite said:

Using CU's alone is just as irrelevant as Teraflops. And I would never condone or support such a thing.

Both are bullshit.

Cu's are just a name associated with a complex of cores/controller/l1 caches etc... Those truly don't mean anything unless you specify the architecture and revision as they are built differently from one architecture to another and from revisions to another. Teraflops represent the same thing notwithstanding the architecture/revision.

Pemalite said:

My argument is that TFLOPS is bullshit in using it to determine the capability of a GPU.

Nor are higher clocks always a detriment to power consumption... It's a balancing act as all CPU's and GPU's have an efficiency curve.

For example... You can buy a CPU, unvolt it... Then overclock it... And result in a CPU that uses less power, but offers higher performance due to it's higher clockrate.

Yes, I know all that but the efficiency curve gets exponentially worse with clocks past a certain speed and GPUs default clocks are always already past that point, binning only has marginal impacts here. You're able to under-volt some GPUs only because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare.

I would say your analogy with cars already get the idea across.

Of course one car having 1000hp and another car having 900hp doesn't mean much when there are several other design elements that will impact the performance of that car from simple 0-100km/h (0-60mph) to time to do a lap (which them can even be affected by the driver itself on exactly same car and conditions either putting 2 drivers to do lap or even same driver doing multiple laps they will be different time).

So would that mean measuring HP as totally useless? Absolutely not =p



duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Azzanation: "PS5 wouldn't sold out at launch without scalpers."