Carzy Zarx’s PC Gaming Emporium - Catch Up on All the Latest PC Gaming Related News

fatslob-:O said:

Goodwill ? I think AMD is playing dumb with it's most loyal followers when HARDWARE FEATURES such as Async Compute or Rapid Packed Math (double rate FP16) are by the very definition proprietary!

nVidia delivered Packed Math first with it's Tegra. AMD followed it second.
Now the implementation is propriety, the concept is not, same goes for Asynchronous Compute.
Mobile heavily leverages FP16 due to how much cheaper it is on the hardware, not just for performance, but power consumption as well, but it does have a ton of caveats.

Asynchronous Compute is a part of the Direct X 12 specification, so it's not really "propriety" as it's an Open Standard that is able to be used by everyone who is adhering to that specification. - Other API's are also exposing the functionality.

One thing to keep in mind is that nVidia and AMD's Asynchronous Compute are approached differently.
Keep in mind, It's all about running Compute and Graphics workloads concurrently. AMD tends to excel here thanks to it's ACE units.

The ACE units can just keep dishing out new work threads with very little latency impact.
nVidia's approach requires the CPU to do a heap of that initial heavy lifting that AMD's ACE units would typically do.

In short, if there are a ton of work threads, AMD's hardware doesn't stall, nVidia's will. Which is why AMD's hardware tends to excel in demanding Asynchronous Comute scenarios.
In-fact during the early Maxwell days if you pushed Asynchronous Compute to hard on nVidia hardware, the driver will stall and Windows would be forced to kill the driver.
If it was lighter Asynchronous Compute, nVidia's hardware was actually faster than AMD's.

fatslob-:O said:

You're overthinking it ... (It's HOW you use the silicon that matters and from that perspective Vega 10 does not even begin to compare with the GP104 in that department, 314mm^2 vs 484mm^2)

I would like for AMD to return to it's small Core strategy that is serving nVidia so well now and what made AMD competitive with the Radeon HD 4000/5000 series.
They did so well during that era.

fatslob-:O said:

Nobody seems to have an issue with the above features being used so I don't know why people are so against the idea of AMD bringing their own competitor to gameworks when they could stand to make their competitor's top performer look slower by as much as 30% on a good day depending the gains or performance characteristics of these features ... (Is Quantum Break DX12 not a good example of this where AMD's competitor Maxwell's architecture cratered in performance in comparison to their own microachitecture ? If every modern AAA game engine was built and designed like the Northlight Engine we wouldn't have to bear seeing AMD agonizing so much.)

Well. The difference here is that nVidia is allowed to take an AMD-styled approach to Asynchronous Compute, that is compatible with AMD's implementation.

Allot of the "features" in Gameworks, such as PhysX is walled off to AMD.
AMD pushed for things like TressFX, which is open source and available to everyone. - nVidia however, built it's own propriety standard and walled it off.

That is ultimately the difference between the two companies approaches.

It's like during the Direct X 10 era, nVidia refused to adopt Direct X 10.1, which forced games to not bothering to support Direct X 10.1 which would have allowed AMD's hardware to shine even better.
Heck, some games actually released with Direct X 10.1 support and were later patched to remove that support.

JEMC said:

And what AMD needed to do was make 480/Polaris 10 a 40 CU part to give it a proper edge over the 1060 and the 470. They focused so much on the "mainstream" market that all their products overlapped with each other, and launching 4 and 8GB versions was an even dumber move.

Polaris uses oddball counts of functional units. It is clearly a design that was compromised in order to reduce costs and price.

AMD probably needed more than 40 CU's though.
It would have brought the hardware from:
* 2304 Shaders - 144 Texture Mapping Units - 32 Rops.
To
* 2560 Shaders - 160 Texture Mapping Units - 32 Rops.

Or roughly an 11% increase in compute, 11% increase in fillrate.

I think a 48 CU design would have been better. It would have meant:
* 3072 shaders - 192 Texture Mapping Units - 32 Rops would have been more ideal.

Would have meant a good 35% increase in compute and fillrate, which would have made it far more attractive against the Geforce 1060.

The caveat to this is... Everyone would have fapped twice as hard over it's mining potential.

fatslob-:O said:

A 40 CU part doesn't change the fundamentals (still has the perf/area issue), it's that AMD needs to follow through with ISV support for current games and games in the near future so that these they can capitalize on those proprietary technology to give AMD the upper advantage ... (Doom is an example of this and I imagine even more so for Wolfenstein 2 with the addition of FP16)

We need Direct X 12 and Vulkan to be the defacto API's already. Then AMD's hardware would look a little more favourable overall.

It's hilariously almost the opposite issue as AMD's older VLIW5 architecture.

VLIW5 was designed so as to provide excellent performance in older workloads from the Direct X 9/10 era and wasn't very good at more modern workloads like Direct X 11.

Graphics Core Next is shit at older/current workloads, but excels in newer games that leverage it's architectures strengths.

Existing User Log In

New User Registration

PC Discussion - Carzy Zarx’s PC Gaming Emporium - Catch Up on All the Latest PC Gaming Related News - View Post

Recent Badges: