EpicRandy said:
Well yeah, I agree, but there are still many flaws with this.
|
It's literally why benchmarks actually exist.
Tflops can never depict anything accurately.
I have already showcased how different GPU's perform lower even with more tflops.
I have already showcased how identical GPU's with the same Tflop can perform at half the speed.
So to assert that it can "depict things accurately" when you can double your performance with a part that has identical Tflops is disingenuous.
Not only that Teraflops only represent single precision floating point...
So for anything involving Quarter Precision, Half Precision, Double Precision, Integers, Geometry throughput, Texel/Pixel/Texture fillrate... Teraflops has no bearing. That's like a massive chunk of the GPU and stuff you know.
EpicRandy said:
Those multiplication represent literally how to stream processor works, they can perform 2 flops per cycle by design. |
Again. It's theoretical, not real world.
Otherwise if we took the Geforce 1030 DDR4 and GDDR5 variants, we wouldn't have one that is almost twice as fast at the same Teraflops.
No, they are not 2 flops per cycle.
They can do 2 operations per cycle, very different. - Not all operations are the same as some can be packed together to form one operation... Or one operation may be split into many.
EpicRandy said:
No, I literally wrote, "physical barrier you could never exceed or even attain". So unless you are able to overclock it by %20 that's not possible. Some real-world scenarios get really close to 100%, just not gaming in general but it's already better with consoles due to static hardware and specific optimization. |
Except you can exceed the theoretical Teraflop number by combining operations if you make your ALU's fat enough.
EpicRandy said:
Sure here's a very good read on the utilization of the TeraScale architecture: Utilization remains a big concern though, for both the SPUs and the SPs within them: not only must the compiler do its best to identify 5 independent datapoints for each VLIW thread, but so must 64 VLIW threads be packed together within each wavefront. Further, the 64 items in a wavefront should all execute against the same instruction; imagine a scenario wherein one thread executes against an entirely different instruction from the other 63! Opportunities for additional clock cycles & poor utilization thus abound and the compiler must do it’s best to schedule around them. With 5 SPs in each SPU, attaining 100% utilization necessitates five datapoints per VLIW thread. That’s the best case; in the worst case an entire thread is comprised of just a single datapoint resulting in an abysmal 20% utilization as 4 SPs simply engage in idle chit-chat. Extremities aside, AMD noted an average utilization of 68% or 3.4 SPs per clock cycle. A diagram from AnandTech’s GCN preview article depicts this scenario, and it’s a good time to borrow it here: The HD 6900 series would serve as the last of the flagship TeraScale GPUs, even as TeraScale based cards continued to release until October of 2013. As compute applications began to take center-stage for GPU acceleration, games too evolved. The next generation of graphics API’s such as DirectX 10 brought along complex shaders that made the VLIW-centric design of TeraScale ever more inefficient and impractically difficult to schedule for. The Radeon HD 7000 series would accordingly usher in the GCN architecture, TeraScale’s inevitable successor that would abandon VLIW and ILP entirely and in doing so cement AMD’s focus on GPU compute going forward.ly because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare. |
Did you really just link to a blog? Either way, I have a very low-level understanding of Terascale and it's derivatives.
And ironically, AMD has introduced VLIW-like ideas into RDNA3 by introducing dual-issue ALU's... It's a cheap way to increase throughput.
Which is partially why going from the Radeon RX 6950 @ 19.3 Teraflops to the RX 7900 XTX @46 Teraflops hasn't resulted in more than double the performance, because Teraflops is bullshit.
EpicRandy said:
Yes, I know all that but the efficiency curve gets exponentially worse with clocks past a certain speed and GPUs default clocks are always already past that point, binning only has marginal impacts here. You're able to under-volt some GPUs only because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare. |
Binning has massive impacts, depending on the process.
Some chips cannot handle higher currents... Otherwise they suffer from an issue known as "electromigration" which will destroy the silicon.
Hence why Polaris through binning went from the RX 480 to RX 580, yes power draw increased with that jump, but only because they could get away with it.
Consequently AMD and Intel use binning on all their CPU's... For example the Ryzen 5500 and 5600X are fundamentally the same chip, but through binning hit different performance/power targets. - And parts of the damaged L3 got lazered off.
Heck we could even go back to Phenom where AMD would use the same Chip design for it's entire lineup from Dual-cores right up to Quad-Cores, some of the chips with a damaged core would have that core disabled, thankfully you could re-enable them by setting Automatic Core Calibration to Auto and re-enable it, sometimes you need to pump more volts or lower clocks.
That's binning.
DonFerrari said: I would say your analogy with cars already get the idea across. Of course one car having 1000hp and another car having 900hp doesn't mean much when there are several other design elements that will impact the performance of that car from simple 0-100km/h (0-60mph) to time to do a lap (which them can even be affected by the driver itself on exactly same car and conditions either putting 2 drivers to do lap or even same driver doing multiple laps they will be different time). So would that mean measuring HP as totally useless? Absolutely not =p |
Nah. It's not like horsepower at all.
It's more like CC's on a engine... Aka. The air displacement.
You can get lower CC engines, outperform higher CC engines based on a number of design factors.
--::{PC Gaming Master Race}::--