Pemalite said:
You are missing the point. |
I understood your point fine; it just wasn’t really what our conversation was about. We weren’t trying to count every possible operation of each data type that might show up in a typical game workload or account for all of the fine-optimizations that potentially can exist on each platform and that are tailored for that platform. The whole idea was to nail down a broad, top-level, far from precise -- but directionally correct, relationship between single-precision TFLOPs and (measured) effective rasterization performance for each architecture that was being discussed, while leaving out the finer details that can vary—even between GPUs of the same architecture. We got there by aligning measured performance with single-precision throughput on an architecture-by-architecture basis for like chips and noticing that there are directional trends across all GPUs of the same architecture.
It didn’t have to be single-precision. The decision is arbitrary. We could’ve used half-precision, INT8, TF32, or some weighted combination of them all, based on the distribution of each data type (or operations) used in a typical engine. We just went with single-precision because it’s the most common data-point that can be found in specifications, and it is supported in the feature-set of practically every consumer GPU, and in the most cores.
And yes, such architecture-level comparisons are imprecise and don't tell the whole picture, but we're not yet at the point where we know the minutia of the Switch 2's hardware and how developers will use it nor does it matter for the broader question that was trying to be resolved.
numberwang was skeptical that the handheld Switch 2 and Steam Deck were in the same ballpark (and the theoretical performance as well), and these broad comparisons are enough to answer that question.
Last edited by sc94597 - on 15 January 2025






