Yeah NVDLA is for edge-compute inference loads that don't need much more than low-precision data-types and which you are pretty much not versioning too often. The goal is to reduce power consumption for very well known inferemce tasks, as much as possible.
Hypothetically, you could use it for low-precision, especially dense, layers in the CNN that powers DLSS, but it would be a lot more hassle than it is worth.
Also everything has to orchestrate via the CPU at compile-time for NVDLA, so that can be another area of bottleneck that you really don't have when directly inferencing from the tensor cores.
And then other issues arise too, like reformatting the data-types of the motion vectors to be consumed by both the tensor cores and NVDLA.
And you're doing all of this for what? Less than a dozen extra tops, in the best case scenario where latency doesn't eat up any gains?
All of the R&D put into this probably would be better spent on training a decently pruned, distilled, and quantized DLSS model that works better on lower-end hardware. Which is why there is speculation of "light-weight Switch 2 DLSS 'bespoke' models" making their rounds.







