Soundwave said:
I think it's 4 TFLOP at FP32 (8 TF at FP16), I agree 8 TFLOPS would be insane. Still 4 TFLOP performance from 20 watts would be very, very impressive. Nintendo could slash that even further in half and still get 2 TFLOP performance from a 10 watt chip. Current Switch runs at 15 watts docked. |
Nah, PX2 is rated at 8TFLOPS FP32...so for 512 cores to pull of that they would need to run @7800MHz...even if Volta's cores can do 2 fused multiply-adds instead of 1 per cycle, that's still 3900MHz...stil insane.
But, what's more, let's say Volta's cores are indeed quite different then previous GPUs...curently you need 150+ W to achieve that sort of performance on 16nm - even with that 12nm TSMC is offering them for Volta there's no chance you can get 8TFLOPS out of 20W SoC, let alone on 16nm Xavier is supposed to be built on.
Pemalite said:
There are 4 chips remember. 2x Tegra SoC's each with 256 Cuda cores each and two pascal powered GPU's in an MXM form factor. |
Yeah, that's what I've been trying to say all along. 20W for 8TFLOPS, all from 512 cores on 16nm (even on 12nm)...yeah, sure.
Now, I think confusion comes from 20 DLTOPS, which are measured for 8-bit integer - cause that's what they said Xavier will match compared to PX2. For example, Tesla P4 is rated at 22DLTOPS, having 2560 cores that run @1063MHz boosted, which is quite slow for GP104 part, and achieves 5.5TFLOPS in 50-75W.
But I honestly don't see how even that can be reduced to 20W, even if it's 12nm TSMC (which I really doubt is true 12nm in the first place).
So while nVidia might pull off SoC that can indeed deliver 20DLTOPS at 20W, I really doubt its FLOPS rating would be anywhere near PX2.







