By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:

nVidia delivered Packed Math first with it's Tegra. AMD followed it second.
Now the implementation is propriety, the concept is not, same goes for Asynchronous Compute.
Mobile heavily leverages FP16 due to how much cheaper it is on the hardware, not just for performance, but power consumption as well, but it does have a ton of caveats.

Asynchronous Compute is a part of the Direct X 12 specification, so it's not really "propriety" as it's an Open Standard that is able to be used by everyone who is adhering to that specification. - Other API's are also exposing the functionality.

One thing to keep in mind is that nVidia and AMD's Asynchronous Compute are approached differently.
Keep in mind, It's all about running Compute and Graphics workloads concurrently. AMD tends to excel here thanks to it's ACE units.

The ACE units can just keep dishing out new work threads with very little latency impact.
nVidia's approach requires the CPU to do a heap of that initial heavy lifting that AMD's ACE units would typically do.

In short, if there are a ton of work threads, AMD's hardware doesn't stall, nVidia's will. Which is why AMD's hardware tends to excel in demanding Asynchronous Comute scenarios.
In-fact during the early Maxwell days if you pushed Asynchronous Compute to hard on nVidia hardware, the driver will stall and Windows would be forced to kill the driver.
If it was lighter Asynchronous Compute, nVidia's hardware was actually faster than AMD's.

AMD still has the competitive advantage since Packed Math isn't built into any desktop Nvidia GPUs. FP16 and Async Compute implementation being proprietary was my point! Just because you're idea is openly available does not make it usable by anyone ... (Conservative rasterization is patented and just because all 3 DirectX graphics hardware vendors support it doesn't mean it's open since it's not available for every other graphics hardware vendor such as Qualcomm, ARM, and ImgTec. Actually scratch that out since Qualcomm has a patent for it.) 

Async Compute is not part of the DX12 specs, it's 'multi-engine' that that is in the DX12 specs but how vendors choose expose it is up to them much like anisotropic filtering. Also DX12 is NOT and open standard, the runtimes, graphics kernel, certification, and the spec is all determined by Microsoft. Much of it goes for Vulkan as well since the spec is determined by the Khronos Group's Architecture Review Board, you can only have an open implementation of Vulkan ... 

AMD tends to excel at async cause they have a rasterizer bottleneck ... (There's probably very few other reasons for it since AMD highly recommends running a compute shader when doing shadow map rendering which is coincidentally geometry throughput intensive. Nvidia doesn't need async compute all that much cause they have very good triangle throughput.) 

Pascal still has limits in it's async compute implementation but who cares since the architecture performs well in current games compared to having mostly dead silicon in AMD hardware cause the rest of the AAA industry doesn't bother with either DX12 or Vulkan aside from consoles ... 

Pemalite said:

I would like for AMD to return to it's small Core strategy that is serving nVidia so well now and what made AMD competitive with the Radeon HD 4000/5000 series.

They did so well during that era.

Even if it means having less hardware features ? 

Pemalite said:

Well. The difference here is that nVidia is allowed to take an AMD-styled approach to Asynchronous Compute, that is compatible with AMD's implementation. 

Allot of the "features" in Gameworks, such as PhysX is walled off to AMD. 
AMD pushed for things like TressFX, which is open source and available to everyone. - nVidia however, built it's own propriety standard and walled it off.

That is ultimately the difference between the two companies approaches.

It's like during the Direct X 10 era, nVidia refused to adopt Direct X 10.1, which forced games to not bothering to support Direct X 10.1 which would have allowed AMD's hardware to shine even better.
Heck, some games actually released with Direct X 10.1 support and were later patched to remove that support.

I don't know about that, might have to check some patents on that ... 

AMD can also have their own walled garden such as 'shader intrinsics' and 'rapid packed math' or even 'async compute' so that Nvidia doesn't benefit from these optimizations when these are AMD specific code paths that depend on driver extensions ... (It'd be nice if AMD can get devs to use underestimate conservative rasterization for GPU occlusion culling to gain an optimization advantage since their competitor doesn't offer that hardware feature currently.) 

Even nicer if AMD can get exclsuive graphics features around these hardware features ... 

Pemalite said:

We need Direct X 12 and Vulkan to be the defacto API's already. Then AMD's hardware would look a little more favourable overall.

It's hilariously almost the opposite issue as AMD's older VLIW5 architecture.

VLIW5 was designed so as to provide excellent performance in older workloads from the Direct X 9/10 era and wasn't very good at more modern workloads like Direct X 11.

Graphics Core Next is shit at older/current workloads, but excels in newer games that leverage it's architectures strengths.

Not only that but the extensive hardware features should also be used to make AMD look more favourable too ... 

VLIW5 is much better recieved than Vega. Vega is like R600, R600 could pass with a few DX11 features and I imagine Vega could be a prototype for DX13 but both are no good cause the hardware features aren't being used ... 

I wonder how many would prefer AMD more than they do now if they just made a bare minimum DX12 videocard but had better performance than their competitor in many current games ?