haxxiy said:
I think likely because the thing is still far from ready. They compared the A100 to the MI210/250s with no problem. Only using the instructions that looked favorable to them like high precision floats, but still. AMD claims it has '8 times the AI performance of the MI250'. That's of course a vague statement but it obviously applies to low-precision instructions with sparsity that CDNA2 can't do natively, hence the huge if somewhat misleading gains. RTX 4090: 512 TCs x 2520 MHz x 1024 OPs = 1321 sparse INT8 TOPS H100 SMX: 528 TCs x 1830 MHz x 2048 OPs = 1979 sparse INT8 TOPS MI300: 'MI250X x 8' = either 1532 or 3064 sparse INT8 TOPS? Depending on how exactly they are counting. Mind, in real-life models will be using instructions that are a quarter or so of these theoretical numbers. Anyway, I think both numbers make sense given the huge transistor count and the uncertain but likely very high TDP. |
Well Nvidia rates their H100 SXM at 3,958 TOPS with Sparse + INT8
https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet
But yea I do agree that it will take some time for CDNA to be ready to compete against Nvidia. Least Epyc is slaughtering Intel in the meantime though.
PC Specs: CPU: 7800X3D || GPU: Strix 4090 || RAM: 32GB DDR5 6000 || Main SSD: WD 2TB SN850