BlkPaladin said:
Actually not if you look in the post above you and my last post, it depends on the chip. You just cannot magically make a chip half percise to run faster. Depending on how they are made it may double the perfermance of FP16 instruction or it may run at the same speed. I use registers in my answer because that is how deep my knowledge goes about these things, I'm sure there are other ways to speed of FP16 instruction other ways. But a register for all intents and purposes of this explination can only run one instruction at a time. And depending on how the chip is made to run the FP32 instructions can influance if the chip experences as speed boost. For example some 32-bit instruction are run on two 16-bit registers. So if it is optimized to do so, if you put 16-bit instructions into this register you can put another instruction at the same time in the other register and thus "twice" the speed in this case. But there are 32-bit registers that will only do one instruction at a time no matter how small the instruction is. So just looking at terms of FLOPS and Full percision/Half percision doesn't tell the entire story. FLOPS, like Hertz before it, is just a advertising go-to word that really has no real world inpact. |
Yes, I know its not a perfect analogy but the point remains the same which is about relative performance. I have no doubt what you said is true but from what I can gather, using FP16 is very situational where as majority of the softwares are made around FP32. And I m sure that at a minimum, Switch, Pro and Scorpio will most likely have it. So I will edit the other two out
PC Specs: CPU: 7800X3D || GPU: Strix 4090 || RAM: 32GB DDR5 6000 || Main SSD: WD 2TB SN850