Soundwave said:
HoloDust said:
Well, those Volta cores must be radically different then what GPUs have been packing for a long time - 512 cores with 2 operations per cycle for FP32 would need to run at insane clocks to achieve 8TFLOPS...as in 7500+ MHz insane.
For reference, made on similar, if not same 16nm proccess as advertised for Xavier, current nVidia tech needs 1920 cores running at 2000+ MHz for similar performance (overclocked GTX1070).
|
I think it's 4 TFLOP at FP32 (8 TF at FP16), I agree 8 TFLOPS would be insane.
Still 4 TFLOP performance from 20 watts would be very, very impressive. Nintendo could slash that even further in half and still get 2 TFLOP performance from a 10 watt chip. Current Switch runs at 15 watts docked.
|
Nah, PX2 is rated at 8TFLOPS FP32...so for 512 cores to pull of that they would need to run @7800MHz...even if Volta's cores can do 2 fused multiply-adds instead of 1 per cycle, that's still 3900MHz...stil insane.
But, what's more, let's say Volta's cores are indeed quite different then previous GPUs...curently you need 150+ W to achieve that sort of performance on 16nm - even with that 12nm TSMC is offering them for Volta there's no chance you can get 8TFLOPS out of 20W SoC, let alone on 16nm Xavier is supposed to be built on.
Pemalite said:
HoloDust said:
Well, those Volta cores must be radically different then what GPUs have been packing for a long time - 512 cores with 2 operations per cycle for FP32 would need to run at insane clocks to achieve 8TFLOPS...as in 7500+ MHz insane.
For reference, made on similar, if not same 16nm proccess as advertised for Xavier, current nVidia tech needs 1920 cores running at 2000+ MHz for similar performance (overclocked GTX1070).
|
There are 4 chips remember. 2x Tegra SoC's each with 256 Cuda cores each and two pascal powered GPU's in an MXM form factor.
http://www.anandtech.com/show/9903/nvidia-announces-drive-px-2-pascal-power-for-selfdriving-cars
The image nVidia used had two Geforce 980 MXM cards.
The Geforce 980M has 1536 Cuda cores. So that would mean 3072 Cuda cores for the discreet GPU's. Then another 512 total with the two Tegra chips for a total of 3584 cuda cores.
Now 3584 Cuda cores * Instructions * Clock rate = flops. 3584 * 2 * 1125mhz = 8.064 Teraflops.
The overall package has a 250W TDP.
No way is a single Xavier chip matching that.
|
Yeah, that's what I've been trying to say all along. 20W for 8TFLOPS, all from 512 cores on 16nm (even on 12nm)...yeah, sure.
Now, I think confusion comes from 20 DLTOPS, which are measured for 8-bit integer - cause that's what they said Xavier will match compared to PX2. For example, Tesla P4 is rated at 22DLTOPS, having 2560 cores that run @1063MHz boosted, which is quite slow for GP104 part, and achieves 5.5TFLOPS in 50-75W.
But I honestly don't see how even that can be reduced to 20W, even if it's 12nm TSMC (which I really doubt is true 12nm in the first place).
So while nVidia might pull off SoC that can indeed deliver 20DLTOPS at 20W, I really doubt its FLOPS rating would be anywhere near PX2.