I think they should still develop and use the lite models in so much as the feedback on where they're lacking improves the models over time (everywhere.) Would love to see what a low parameter model can achieve if distilled from better larger models. This lite model definitely seems like a v1 to a viable v2 or v3.
We've already seen SWO give better result than Hogwarts Legacy with pretty minor changes to the post-processing.
CNN Autoregressors scale linearly in time complexity according to parameter count, while following an inverse power-law for performance improvements (measured by loss). So if you shift that negative curve a bit to the left by it implicitly learning heuristics from a better, bigger model (such as a leading-edge transformer one) you probably could exceed DLSS 3's ability while using half the parameters (again CNN's scale linearly) like the current small model.
I am not familiar with how vision transformer models scale toward the dozen million parameter counts, but if we guess based on language models, it probably isn't too viable. Usually you need a few hundred million parameters to get something that outperforms "hard-coded" heuristic or frequency algorithms in language transformers, and I am guessing it is similar for vision. That's probably why DLSS 4 is twice as large rather than the same size as DLSS 3. So a direct transformer model on SW2 probably isn't likely, unless there is some architectural change.
But a CNN-ViT hybrid (especially through distillation) is definitely viable and I think we'll see improvements to the "lite" model over time. Some of which can be easily back-ported to old games through updates that just swap out the model.
Note above is on classification tasks, but the results/improvements should generalize to generation as well.
Last edited by sc94597 - on 05 October 2025






