| Pemalite said: So your evidence is a reddit thread... |
The evidence is found in the reddit thread, it's not "the reddit thread." The user conducted an experiment and I shared their results with you, but other users have also validated that tensor core utilization varies over time when running DLSS workloads. It is also something those of us who build and run CNN (and ViT) models on a day-to-day basis see, and makes sense from a theory perspective given the architecture of a CNN (or ViT.) You're not going to be multiplying the same ranked matrices all the time*, nor will your workload always be core-bottlenecked, often the bottleneck is the memory bandwidth. The evidence I shared is the fact that we see a literal order of magnitude difference between average usage and peak usage. Any CNN (or ViT) will have this same usage pattern, because they all use the same tools. Maybe for Switch 2, using a hypothetical bespoke model, it is 3% average vs. 30% peak utilization (instead of the .3% vs. 4% of an RTX 4090), but either way average usage << peak usage.
THAT was the point I am making, and the one important to the topic of considering the relative power consumption of the tensor cores to the rasterized workloads they are reducing. A workload that spikes up to 100% only one-tenth of the time isn't going to consume as much power as one that is pegged at 100% all of the time.
Developers are indeed free to use 100% of the system resources, they are also free to limit power-consumption in handheld mode and have done so with the original Switch. That's why battery life varied by title. There were different handheld clock modes that developers used for different titles based on how demanding the title was on the systems resources. What DLSS provides them is the option to reduce clocks more often (if their goal is longer battery life) by reducing the rasterized workload without a power-equivalent increase to the tensor-core workload (even if the tensor utilization eats into it.) In other words, they are more efficiently achieving a similar output.
I don't even know why you're arguing with this. People do this all the time on gaming handhelds like the Steam Deck for many games. They'll cap their power-limit to 7W and use FSR to make up the difference, maximizing the battery life, and not having that worse of a qualitative experience. When they are on a charger or dock, they change their settings to rasterize at a higher internal resolution, as battery life is no longer a consideration.
*Matrix multiplication algorithms scale either cubicly with rank for high ranked matrices or super-quadratically, sub-cubicly with rank for low ranked matrices. Then there are factorization layers that can reduce rank based on the matrix sparcity. Different layers in the network are going to have different ranks and sparsities and therefore take up different resources.







