sc94597 said:
There are only two ways they could do this. 1. They pre-trained the model on generalized image-to-image. This is unlikely for a few reasons. Good general image-to-image models are relatively huge. The open-source ones start at around 13 billion parameters. That is not feasible to inference in real time even on a single data center GPU, let alone gaming ones. An RTX 5090 inferences on these models about 2 images per second, just for context. The datasets used to train them are also huge. Nvidia doesn't have access to any buffer data on these data samples like they do with their regular DLSS training sets. Now Nvidia could train an (or more likely source an already trained) image to image model and use it as a teacher for their specialized gaming specific model. But there are two issues with that. The first is that it would very much skew the codomain so much that you are risking the efficacy of your gaming specific model. The second is that it is a very inefficient method given the target objective is so specific. 2. They have invested heavily in model interpretation research and pulled off something like Claude's Golden Gate Bridge experiment but for image models rather than LLMs. If that were the case, they'd be able to allow much more control than you are talking about. You really don't need text or image inputs in this case, and can just directly control the model parameter weights. See: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html This model is more likely something like what is described in this paper, https://arxiv.org/pdf/2105.04619 but without using the G-buffer at all (if we take Nvidia's press release at face-value that they only use color and velocity buffers) and probably using a vision transformer instead of a CNN. |
I wonder if what they're doing isn't all that dissimilar from these kinds of videos that are all over Tiktok/Instagram etc.:
https://www.instagram.com/reel/DVxSh0ODVec/
Seems like Google/Youtube does not allow or want too many videos like this because they're hard to find on Youtube but all over the place on Insta/Tiktok.
Just instead of a person it's a lighter data set that's trained more to enhance things like eyes, lips, wrinkles, increasing brightness, etc. of game characters from a bunch of data.
Last edited by Soundwave - 2 days ago






