EpicRandy said:
Well yeah, I agree, but there are still many flaws with this.
Those multiplication represent literally how to stream processor works, they can perform 2 flops per cycle by design.
No, I literally wrote, "physical barrier you could never exceed or even attain". So unless you are able to overclock it by %20 that's not possible. Some real-world scenarios get really close to 100%, just not gaming in general but it's already better with consoles due to static hardware and specific optimization.
Sure here's a very good read on the utilization of the TeraScale architecture: Utilization remains a big concern though, for both the SPUs and the SPs within them: not only must the compiler do its best to identify 5 independent datapoints for each VLIW thread, but so must 64 VLIW threads be packed together within each wavefront. Further, the 64 items in a wavefront should all execute against the same instruction; imagine a scenario wherein one thread executes against an entirely different instruction from the other 63! Opportunities for additional clock cycles & poor utilization thus abound and the compiler must do it’s best to schedule around them. With 5 SPs in each SPU, attaining 100% utilization necessitates five datapoints per VLIW thread. That’s the best case; in the worst case an entire thread is comprised of just a single datapoint resulting in an abysmal 20% utilization as 4 SPs simply engage in idle chit-chat. Extremities aside, AMD noted an average utilization of 68% or 3.4 SPs per clock cycle. A diagram from AnandTech’s GCN preview article depicts this scenario, and it’s a good time to borrow it here: The HD 6900 series would serve as the last of the flagship TeraScale GPUs, even as TeraScale based cards continued to release until October of 2013. As compute applications began to take center-stage for GPU acceleration, games too evolved. The next generation of graphics API’s such as DirectX 10 brought along complex shaders that made the VLIW-centric design of TeraScale ever more inefficient and impractically difficult to schedule for. The Radeon HD 7000 series would accordingly usher in the GCN architecture, TeraScale’s inevitable successor that would abandon VLIW and ILP entirely and in doing so cement AMD’s focus on GPU compute going forward.
Cu's are just a name associated with a complex of cores/controller/l1 caches etc... Those truly don't mean anything unless you specify the architecture and revision as they are built differently from one architecture to another and from revisions to another. Teraflops represent the same thing notwithstanding the architecture/revision.
Yes, I know all that but the efficiency curve gets exponentially worse with clocks past a certain speed and GPUs default clocks are always already past that point, binning only has marginal impacts here. You're able to under-volt some GPUs only because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare. |
I would say your analogy with cars already get the idea across.
Of course one car having 1000hp and another car having 900hp doesn't mean much when there are several other design elements that will impact the performance of that car from simple 0-100km/h (0-60mph) to time to do a lap (which them can even be affected by the driver itself on exactly same car and conditions either putting 2 drivers to do lap or even same driver doing multiple laps they will be different time).
So would that mean measuring HP as totally useless? Absolutely not =p
duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"
http://gamrconnect.vgchartz.com/post.php?id=8808363
Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"
http://gamrconnect.vgchartz.com/post.php?id=9008994
Azzanation: "PS5 wouldn't sold out at launch without scalpers."