fleischr said:
Are these the same number of flops for both images? As in say 1TF of in FP16 vs 1TF in FP32? The fact it's listed simply as 'competing multicore GPU' and not anything else specfic comes across dubious. |
It's not the matter of flops, it's the matter of presicion. Again:
"To get an idea of what a difference in precision 16 bits can make, FP16 can represent 1024 values for each power of 2 between 2-14 and 215 (its exponent range). That’s 30,720 values. Contrast this to FP32, which can represent about 8 million values for each power of 2 between 2-126 and 2127. That’s about 2 billion values—a big difference."
https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
Of course, all those pictures are Imagination's marketing...probably worst case scenarios... there's a reason why FP16 can and is used on mobiles - small screens. But blow that up on TV and you have a different story. I can't find them now, but I remember pics of HL2 running in FP16 and FP32...lot of artefacts in FP16.
Again, not expert on the matter, not by a long shot, but my understanding is that FP16 is usefull in some very limited cases and that performance gains from mixed FP32/FP16 code are quite modest...at least in games.