By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:
sc94597 said:

In the conversation that was being had it is already known that we are talking about an Ampere chip (T239) and single-precision, by convention. The missing variables were frequencies and core-counts, with architecture and precision of interest being held constant. 

Node size is important for those things yes, but if we already know the frequency and core-counts (as we essentially now do), it is no longer important for calculating hypothetical single-precision floating point performance. 

You are missing the point.

Just like the current Switch, many developers won't use pure single precision floating point... Ergo using single precision floating point/teraflops is irrelevant when comparing the Switch 2.0 against it's competition.
They will use mixed precision by combining two 16bit operations into a faux-32bit one to be done in a single cycle wherever possible.

This is to conserve battery life and to boost throughput.

This isn't going to happen on the Steamdeck as it relies on PC development/ports.
And it definitely doesn't happen on Playstation 5 and Series X.

I understood your point fine; it just wasn’t really what our conversation was about. We weren’t trying to count every possible operation of each data type that might show up in a typical game workload or account for all of the fine-optimizations that potentially can exist on each platform and that are tailored for that platform. The whole idea was to nail down a broad, top-level, far from precise -- but directionally correct, relationship between single-precision TFLOPs and (measured) effective rasterization performance for each architecture that was being discussed, while leaving out the finer details that can vary—even between GPUs of the same architecture. We got there by aligning measured performance with single-precision throughput on an architecture-by-architecture basis for like chips and noticing that there are directional trends across all GPUs of the same architecture.  

It didn’t have to be single-precision. The decision is arbitrary. We could’ve used half-precision, INT8, TF32, or some weighted combination of them all, based on the distribution of each data type (or operations) used in a typical engine. We just went with single-precision because it’s the most common data-point that can be found in specifications, and it is supported in the feature-set of practically every consumer GPU, and in the most cores. 

And yes, such architecture-level comparisons are imprecise and don't tell the whole picture, but we're not yet at the point where we know the minutia of the Switch 2's hardware and how developers will use it nor does it matter for the broader question that was trying to be resolved. 

numberwang was skeptical that the handheld Switch 2 and Steam Deck were in the same ballpark (and the theoretical performance as well), and these broad comparisons are enough to answer that question. 

Last edited by sc94597 - on 15 January 2025