sc94597 said: (snip) |
Don't get me wrong, people working with FPGAs have known for years that 4 to 6 bits were the ideal precision for LLMs and a lot of transformer models. For others, however, there is a dramatic drop in accuracy for every exponent removed below FP8.
Presenting that as equivalent to doubling FP8 performance for everyone is questionable, to say the least. Just as it is misleading people to think they MFG their way out of intrinsically low framerate situations... as you would upgrading hardware in the past... which was the point I was making.
Cyran said: (snip) |
Interestingly... all of their performance graphs had a very similar ~33% improvement vs. the past hardware they were comparing to, indicating these slides were curated to some extent to achieve that specific effect.
We'll see when the hardware comes out. On paper, as you said, the only decent jump is the 5090.