walsufnir said:
It isn't really faster just because you lose precision. It *could* be faster if a) the algorithm or the computation doesn't need to be fp32 in all cases and b) the hardware can store and use two 16 bit floating point numbers in the same time it would store and use one fp32. Everyone is free to use fp16 right now but there is no real benefit for that. |
Shouldn't it be easy to implement though? Developers mark the stuff that is FP16 and the GPU driver does the rest.
If you demand respect or gratitude for your volunteer work, you're doing volunteering wrong.