Pemalite said: That is blatantly false. |
The piece of my post you quoted there was not a general statement meant to apply to every circumstance. It was specifically talking about the Steam Deck's situation vs. Switch 2's and how they differed with different performance-designs and different efficiences. Never did I suggest that optimizing for a higher frequency couldn't be more efficient, just that Nintendo achieved similar performance at better efficiencies by going for a wide and low-frequency optimum, whereas the Steam Deck pushes past its optimum (which seems to be about 1200Mhz) to achieve better performance but at the cost of halving its efficiency.
It's irrelevant if it's per module or all modules. Teraflops are identical irrespective of architecture, they are the same single precision numbers. More goes into rendering a game that it's simply not about the Teraflops... It actually never has been. It's not always as simple as that. There is more to it than that, you also need to include the number of instructions and the precision. The teraflops are the same regardless if it's Graphics Core Next or RDNA or Ampere. |
Pegging things to TFLOPs is simply a means of forming a heuristic for dimensional analysis from knowns about Ampere (and the architectures we're comparing it to.) I don't literally think TFLOPs are different on the different architectures, but how the TFLOPs relate to hypothetical rasterization performance units (which is a hidden variable we can only guess through observation) from architecture to architecture can be measured, again to form a heuristic. Likewise, there is a measured data-point that the "sweet-spot" for Ampere GPU's has been roughly 25 GBps of memory bandwidth per TFLOP. It isn't some rule or engineering principle, but a heuristic or "rule of thumb" that may or may not have been used in the design process based on experimentation and measurement, but is useful for fitting things together because that is how Nvidia actually designed the Ampere line.
To generalize, these exercises are examples of constructing a Fermi Problem and using knowns to make rough, but not totally out-of-place guesses. It's very common in almost every engineering and scientific field when you have many different variables that can be summarized by the relationships between a smaller subset of variables (such as TFLOPs, FPS, Synthetic Benchmark Units, etc in this case.)
Last edited by sc94597 - on 15 January 2025