CGI-Quality said: I'll also explain the MASSIVE jump in CUDAs. NVIDIA was looking to greatly improve the Ampere SM (streaming multiprocessors) over Turing. This is in FP32 (or single precision floating-point format/operations). It is also where the theoretical peak (teraflop count) is measured. Ultimately, when you double the processing speed (and double the data paths as a necessity to that), it helps many more things on the card. |
I feel more like it's a cop-out or a bad compromise. Turing got it right by having dedicated paths for INT and FP loads. Ampere is basically just a cheap way to increase FP cores without sacrificing too much space to INT cores. That leads to less efficient cores. For example if you take the worst case scenario of having always loads of 64FP32 and 64INT32 on every SM you'd have the exact same performance as Turing per cycle. Basically the only reason why we see big performance improvements at all is that games have generally higher loads of FP32 than INT32 (and of course the increased clocks and SM count).
I'm very interested how they'll improve that with Hopper.
If you demand respect or gratitude for your volunteer work, you're doing volunteering wrong.