One of many reasons why TFLOPs shouldn't be compared across micro-architectures.
https://www.tomshardware.com/features/nvidia-ampere-architecture-deep-dive
"With Turing, Nvidia said that in many games (looking at a broad cross section of games), roughly 35% of the CUDA core calculations were integer workloads. Memory pointer lookups are a typical example of this. If that ratio still holds, one third of all GPU calculations in a game will be INT calculations, which potentially occupy more than half of the FP32+INT portion of the SMs."
Besides, we know with >99% certainty the max TFLOPs the SW2 currently supports (unless GPU clocks are lifted it's set in stone) because we know the max clock rates and core counts. There is no point speculating on well-knowns like that.







