By using this site, you agree to our Privacy Policy and our Terms of Use. Close
JEMC said:
vivster said:

It's their technology and they can claim as much as they want. It's not that they're wrong, even. A shader isn't a firmly defined entity and the amount of shaders does not define performance. How would you feel if the shader count is correct but it turns out the shaders are actually only half as capable as previous shaders? That wouldn't be false advertising but it would have the same effect.

The current facts are that the numbers do not add up, which means that either the advertised shaders are bad OR not as numerous but better. I opt for the latter.

It can aslo mean that the shaders aren't fully used. Some years ago, I don't remember if it was with Fury or the Vega cards, AMD had that problem. Those cards had something close to double the shaders of the regular, mainstream cards but didn't offer twice the performance because the chips wasn't well scaled and not all shaders could be used. Something akin could have happened this time to Nvidia, only to a less extend.

Another option would be that drivers still need to mature more and can't take full use of the new hardware.

From the performance figures they've given, Ampere has 98% more flops per watt than Turing but only 21% more performance, on average. That means one needs 1.61 Ampere flops to equal the peformance of 1 Turing flops, and 1.5 Ampere flops to equal 1 RDNA 1.0 flops.

It seems clear to me each shader was effectively cut in half before some architectural improvements, or perhaps it was the increased number of FP32 engines themselves that increased the performance relative to Turing.

With RDNA 2.0 apparently focusing on IPC, it would seem like Nvidia and AMD have more or less switched places concerning what their GPU design philosophy historically used to be. Ampere is very Terascale-like (lots of shaders, lower clocks and performance) while RDNA 2.0 is kind of Fermi-like (higher cloks and IPC but less shaders).

An Ampere CUDA core also has some similarities with Bulldozer modules, in that a second (integer in the case of Bulldozer, floats in the case of Ampere) unit was added to each processing core to increase performance and also make into those PR slides with twice the number of cores.

So, I don't think it's feasible to expect there's more performance left in future drivers (the same way that magical expectation wasn't feasible with Terascale or GCN).