Ok, so here's a kind of recap about Pascal and the Tesla P100.
This is Tesla P100:

And this is Pascal's P100 core diagram:

First things first: The chip has 8x512-bit memory controllers for a total of 4096-bit memory bus width. This is because the P100 uses 4xHBM2 memory stacks instead of GDDR5.
The GPU itself is compromised of 6 GPCs (Graphics Processor Cluster), with each cluster comprised of 10 SMs (Streaming Multiprocessor). This is a departure from Maxwell and its 4 SMs for each GPC, of which also had 6 GPUnits.
Now let's take a closer look at a SM unit

As you can see, each Pascal SM is comprised of 64 CUDA cores (or shaders) and 4 Texture Units, whereas Maxwell also had 4 Texture Units but the Shader Units count was twice as much with 128.
That's because Nvidia has focused a lot on the computing side of things, improving its single precision (FP32) performance but specially it's Double Precision (FP64) results, achieving a whooping 1:2 ratio between single and double precision compared to the paltry 1:32 ratio of Maxwell (that was beated by Kepler).
Finally, the chip features 14MB of shared register files and 4MB of L2 cache.
And here is a table comparing the Tesla P100 with the previous Tesla products and top-end cards:
|
Tesla P100 |
|
Tesla M40 |
GTX Titan X |
|
Tesla K40 |
GTX Titan Black |
GPU |
GP100 |
|
GM200 |
GM200 |
|
GK110B |
GK110B |
Architecture |
Pascal |
|
Maxwell 2 |
Maxwell 2 |
|
Kepler |
Kepler
|
GPC |
6 |
|
6 |
6 |
|
5 |
5 |
SMs |
56 |
|
24 |
24 |
|
15 |
15 |
CUDA Cores/SM |
64 |
|
128 |
128 |
|
192 |
192 |
CUDA Cores |
3584 |
|
3072 |
3072 |
|
2880 |
2880 |
Texture Units/SM |
4 |
|
4 |
4 |
|
16 |
16 |
Texture Units |
224 |
|
192 |
192 |
|
240 |
240 |
ROPs |
- |
|
96 |
96 |
|
48 |
48 |
Core Clock |
1328 MHz |
|
948 MHz |
1000MHz |
|
745 MHz |
889MHz |
Boost Clock |
1480 MHz |
|
1114 MHz |
1075MHz |
|
810/875 MHz |
980MHz |
Memory Type |
HBM2 |
|
GDDR5 |
GDDR5 |
|
GDDR5 |
GDDR5 |
Memory Clock |
1,4GHz |
|
6GHz |
7GHz |
|
6GHz |
7GHz |
Memory Bus Width |
4096-bit |
|
384-bit |
384-bit |
|
384-bit |
384-bit |
Memory Bandwidth |
720GB/sec |
|
288GB/sec |
336GB/sec |
|
288GB/sec |
336GB/sec |
VRAM |
16 GB |
|
12 GB |
12GB |
|
6 GB |
6GB |
TDP |
300 Watts |
|
250 Watts |
250W |
|
235 Watts |
250W |
Transistor Count |
15.3 Billions |
|
8 Billions |
8 Billions |
|
7.1 Billions |
7.1 Billions |
Single Precision FP32 |
10.6 TFLOPS |
|
6.8 TFLOPS |
6.14 TFLOPS |
|
4.29 TFLOPS |
5.1 TFlops |
FP64 |
1/2 FP32 |
|
1/32 FP32 |
1/32 FP32 |
|
1/3 FP32 |
1/3 FP32 |
Double Precision FP64 |
5.3 TFLOPS |
|
213 GFLOPS |
- |
|
1.43 TFLOPS |
- |
Manufacturing Process |
TSMC 16nm |
|
TSMC 28nm |
TSMC 28nm |
|
TSMC 28nm |
TSMC 28nm |
GPU Die Size |
610 mm² |
|
601 mm² |
601 mm² |
|
551 mm² |
551 mm² |
Of course, anyone expecting the upcoming GTX 1080/1070 (or whatever they are called) to be as big as P100 will be disappointed. Expect the 1080/1070 to be based on a GP 104 chip with the number of Graphics Processor Clusters reduced to 4, and the 1060/1050 to have 2 GPCs.
Please excuse my bad English.
Currently gaming on a PC with an i5-4670k@stock (for now), 16Gb RAM 1600 MHz and a GTX 1070
Steam / Live / NNID : jonxiquet Add me if you want, but I'm a single player gamer.