By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo - How Will be Switch 2 Performance Wise?

 

Switch 2 is out! How you classify?

Terribly outdated! 3 5.26%
 
Outdated 1 1.75%
 
Slightly outdated 14 24.56%
 
On point 31 54.39%
 
High tech! 7 12.28%
 
A mixed bag 1 1.75%
 
Total:57
sc94597 said:
Pemalite said:

So your evidence is a reddit thread...

But even gleaming the reddit thread we can already glean some glaring issues as you failed to grasp some intrinsic technical aspects of DLSS as you lacked appropriate context in your reply.
DLSS is an algorithm with a fixed amount of resources required to run.

1) Like you alluded to... The user is showcasing a Geforce RTX 4090 with 330 tensor ops via 512 Tensor cores. - This is Turing with 2x the Tensor throughput of Ampere.
2) Switch 2 uses 48 Tensor cores with likely a max throughput of 6 TOPS.

If we assume that 1% of the 4090's is your regular loading on the tensor cores... Then that means DLSS would require about 3.3TOPS.

So Switch 2's Tensor cores are at 50% utilization, the peak exceeds the Switch 2's Tensor throughput entirely.
But this is a like-for-like algorithm, which will not happen with the Switch 2 as it likely uses an algorithm specifically optimized for the hardware limitations and characteristics. (I.E. Being Ampere and not Turing.)

That's also failing to ignore that the Switch 2 is in a walled garden and develoeprs are free to use 100% of the systems resources, regular rasterization and ray tracing will be using 100% of those resources with Tensor operations as an added extra on top.

And the battery life reflects this as battery life on Switch 2 is extremely poor, worst than the original Switch 1 launch model.

The evidence is found in the reddit thread, it's not "the reddit thread." The user conducted an experiment and I shared their results with you, but other users have also validated that tensor core utilization varies over time when running DLSS workloads. It is also something those of us who build and run CNN (and ViT) models on a day-to-day basis see, and makes sense from a theory perspective given the architecture of a CNN (or ViT.) You're not going to be multiplying the same ranked matrices all the time*, nor will your workload always be core-bottlenecked, often the bottleneck is the memory bandwidth. The evidence I shared is the fact that we see a literal order of magnitude difference between average usage and peak usage. Any CNN (or ViT) will have this same usage pattern, because they all use the same tools. Maybe for Switch 2, using a hypothetical bespoke model, it is 3% average vs. 30% peak utilization (instead of the .3% vs. 4% of an RTX 4090), but either way average usage << peak usage. 

THAT was the point I am making, and the one important to the topic of considering the relative power consumption of the tensor cores to the rasterized workloads they are reducing. A workload that spikes up to 100% only one-tenth of the time isn't going to consume as much power as one that is pegged at 100% all of the time. 

Developers are indeed free to use 100% of the system resources, they are also free to limit power-consumption in handheld mode and have done so with the original Switch. That's why battery life varied by title. There were different handheld clock modes that developers used for different titles based on how demanding the title was on the systems resources. What DLSS provides them is the option to reduce clocks more often (if their goal is longer battery life) by reducing the rasterized workload without a power-equivalent increase to the tensor-core workload (even if the tensor utilization eats into it.) In other words, they are more efficiently achieving a similar output. 

I don't even know why you're arguing with this. People do this all the time on gaming handhelds like the Steam Deck for many games. They'll cap their power-limit to 7W and use FSR to make up the difference, maximizing the battery life, and not having that worse of a qualitative experience. When they are on a charger or dock, they change their settings to rasterize at a higher internal resolution, as battery life is no longer a consideration. 

*Matrix multiplication algorithms scale either cubicly with rank for high ranked matrices or super-quadratically, sub-cubicly with rank for low ranked matrices. Then there are factorization layers that can reduce rank based on the matrix sparcity. Different layers in the network are going to have different ranks and sparsities and therefore take up different resources. 

I think you are missing the point.
The entire graphics engine on the Switch 2 will be under 100% utilization... It's a walled garden with fixed hardware... That's texture mapping units, that's pixel shaders, vertex shaders, polymorph and more. That will be pegged fully.

Tensor operations done on the tensor cores is an added "extra" load on top that adds to power consumption, doesn't reduce it.

FSR is obviously different again on the Steamdeck as it's using the pixel shaders, not seperate tensor cores.

And again... The fact that Switch 2 has lower battery life than the Switch 1 literally proves there is no power reductions anyway.




www.youtube.com/@Pemalite

Around the Network
Pemalite said:

I think you are missing the point.
The entire graphics engine on the Switch 2 will be under 100% utilization... It's a walled garden with fixed hardware... That's texture mapping units, that's pixel shaders, vertex shaders, polymorph and more. That will be pegged fully.

Tensor operations done on the tensor cores is an added "extra" load on top that adds to power consumption, doesn't reduce it.

FSR is obviously different again on the Steamdeck as it's using the pixel shaders, not seperate tensor cores.

And again... The fact that Switch 2 has lower battery life than the Switch 1 literally proves there is no power reductions anyway.

No, you're missing the point of my original post here. Which is evident by the fact you're bringing up Switch 1, which wasn't anything we were comparing to. 

Let's breakdown what the original point was, 

You have a goal, and two ways to achieve it, Let's say the goal is to be able to render a game at 1080p: 

  1. You render the game at 1080p natively and fully utilize the CUDA cores to do so. 
  2. You render the game at 540p, upscale it to 1080p using DLSS, the game actually has better image quality than the native 1080p version. 

Now you've saved a significant amount of the CUDA utilization. You can ostensibly then do two things, or a combination of both of them: 

  1. Reduce the max clock rates of the CUDA cores to save on power consumption. 
  2. Re-allocate the now freed up resources to other workloads in the pipeline. 

Are you seriously arguing that developers will always and only choose #2 and never choose #1, or a combination of both #1 and #2? And if they do choose #1, do you think the extra load on the tensor cores from the very spikey DLSS workload is going to fully eat into the power-consumption savings obtained by reducing the clock rate of the CUDA cores? 

And it doesn't matter that the Steam Deck/FSR uses pixel shaders for this argument that was originally made. Using FSR, the user is able to cap the clock rates and get a similar enough experience qualitatively to the higher clocked more power consumptive workload they would've chosen if battery life weren't a constraint they cared about. Sure there is a slight difference in that pixel shaders are much more general-purpose than matrix cores/tensor cores, and there is a more direct trade-off, but for the discussion of being able to reduce the max clock rate and save power, upscaling does give developers (and users) more options than they would've otherwise had. 

As for your comparison to Switch 1, what makes you think the Switch 2 wouldn't come out even worse without DLSS being an option? Maybe more games are at the minimum of the battery-life expectancy range than the maximum in that alternative reality? 

Last edited by sc94597 - on 22 April 2025

curl-6 said:

If 540p can be scaled to 1080p that well, that's a very promising sign for DLSS giving Switch 2 a real leg up in terms of performance.

If you only need to render at say 540p to get decent image quality then that really opens up the range of games the console can handle without resorting to the kind of blurriness we saw on many "impossible ports" to Switch 1.

That's DLSS Performance mode. It is...well...decent at 1080p output, especially if you're not too close to TV. It works much better for higher output resolutions, especially for 4K.

DLSS is powerful tool, though it should not be mistaken for some miracle cure - native will always look cleaner, and way better if it's using proper AA. Of course there's always "native", as in game rendering natively and then smudging everything with cheap TAA, like a lot of modern games do - then DLSS quality indeed can often look in some aspects better than "native".

Last edited by HoloDust - on 23 April 2025

HoloDust said:
curl-6 said:

If 540p can be scaled to 1080p that well, that's a very promising sign for DLSS giving Switch 2 a real leg up in terms of performance.

If you only need to render at say 540p to get decent image quality then that really opens up the range of games the console can handle without resorting to the kind of blurriness we saw on many "impossible ports" to Switch 1.

That's DLSS Performance mode. It is...well...decent at 1080p output, especially if you're not too close to TV. It works much better for higher output resolutions, especially for 4K.

DLSS is powerful tool, though it should not be mistaken for some miracle cure - native will always look cleaner, and way better if it's using proper AA. Of course  there's always "native", as in game rendering natively and then smudging everything with cheap TAA, like a lot of modern games do - then DLSS quality indeed can often look in some aspects better than "native".

Oh I know it's not magic, but compared to Switch 1 where we often had to deal with sub-HD resolutions scaled up with TAAU/FSR alone, it should be a massive step up.

Should also help with games that are too demanding to for decent native resolutions, like say something that is already pushing more powerful console hardware and so would otherwise be impossible to pull off on Switch 2 at an acceptable level of quality.



Pemalite said:
sc94597 said:

The GPU might be fully-utilized in this case, as resources are re-allocated to other parts of the graphics pipeline when saved from running at the higher resolution, or the developer could just under-clock the GPU for better battery life, as they do in lighter-weight games or as we see on PC handhelds. 


That's also failing to ignore that the Switch 2 is in a walled garden and develoeprs are free to use 100% of the systems resources, regular rasterization and ray tracing will be using 100% of those resources with Tensor operations as an added extra on top.

And the battery life reflects this as battery life on Switch 2 is extremely poor, worst than the original Switch 1 launch model.

Pemalite said:

I think you are missing the point.
The entire graphics engine on the Switch 2 will be under 100% utilization... It's a walled garden with fixed hardware... That's texture mapping units, that's pixel shaders, vertex shaders, polymorph and more. That will be pegged fully.

Tensor operations done on the tensor cores is an added "extra" load on top that adds to power consumption, doesn't reduce it.

Why will regular rasterization and ray tracing will be using 100% of the system ressurces?

Why will the entire graphics engine on the Switch 2 be under 100% utilization?

Every developer can choose, how much of the system resources they want to use and how they distibute these resources (resolution, DLSS, effects). Not every Switch 2 game will use 90 - 100% of the system resources, some will even use a lot less.

Why else would Nintendo give that a wide range of more than 3x (2 hours minimum, 6.5 hours maximum).

The needed resources even change within a game all the time.



Around the Network
curl-6 said:
HoloDust said:

That's DLSS Performance mode. It is...well...decent at 1080p output, especially if you're not too close to TV. It works much better for higher output resolutions, especially for 4K.

DLSS is powerful tool, though it should not be mistaken for some miracle cure - native will always look cleaner, and way better if it's using proper AA. Of course  there's always "native", as in game rendering natively and then smudging everything with cheap TAA, like a lot of modern games do - then DLSS quality indeed can often look in some aspects better than "native".

Oh I know it's not magic, but compared to Switch 1 where we often had to deal with sub-HD resolutions scaled up with TAAU/FSR alone, it should be a massive step up.

Should also help with games that are too demanding to for decent native resolutions, like say something that is already pushing more powerful console hardware and so would otherwise be impossible to pull off on Switch 2 at an acceptable level of quality.

Oh, certainly - I don't have exact numbers, so it's just a rough napkin math based on 3060 Ti and 3090 official numbers for DLSS, but I'm guesstimating they need to render at 69-70fps @540p for DLSS Performance to achieve 60fps @1080p, which is some 14-15% hit compared to native 540p...still way, way better than native 1080p, which wold be ~50% hit, or worse.



HoloDust said:
curl-6 said:

Oh I know it's not magic, but compared to Switch 1 where we often had to deal with sub-HD resolutions scaled up with TAAU/FSR alone, it should be a massive step up.

Should also help with games that are too demanding to for decent native resolutions, like say something that is already pushing more powerful console hardware and so would otherwise be impossible to pull off on Switch 2 at an acceptable level of quality.

Oh, certainly - I don't have exact numbers, so it's just a rough napkin math based on 3060 Ti and 3090 official numbers for DLSS, but I'm guesstimating they need to render at 69-70fps @540p for DLSS Performance to achieve 60fps @1080p, which is some 14-15% hit compared to native 540p...still way, way better than native 1080p, which wold be ~50% hit, or worse.

The T239 in the Switch 2 supposedly has 64 Ampere Tensor Cores, the same as the RTX 2050 Laptop, so it's basically going to be slow with it's DLSS processing compared to the majority of DLSS capable GPU's. Unfortunately the T239 does not have the Deep Learning Accelerator the T234 had, which could have greatly reduced the overhead of DLSS on Switch 2.

Scaling from 720p to 1080p with DLSS is a 12% performance hit on the RTX 2050 Mobile, I assume 540p to 1080p takes more out of the tensor cores so yeah a 14-15% hit in performance going from 540p to 1080p may be about right.

But really with the low number of tensor cores and the lack of the Deep Learning Accelerator it seems for the most part DLSS is going to be used to scale to 1080p, occasionally 1440p and 4k almost never. The performance hit using DLSS to upscale to 4k on the RTX 2050 Mobile is huge.

Nintendo/Nvidia didn't go all in on DLSS for the Switch 2.

"T239 may share the same GPU architecture as T234 - the Ampere architecture used by Nvidia for its RTX 30-series graphics cards - but everything else is all-new. The deep learning accelerator and ARM Cortex A78AE are gone, while the 2048 CUDA core GPU is slimmed down to 1536 cores."

Still for 1080p gaming it's going to be great, and handheld will especially benefit.



Zippy6 said:
HoloDust said:

Oh, certainly - I don't have exact numbers, so it's just a rough napkin math based on 3060 Ti and 3090 official numbers for DLSS, but I'm guesstimating they need to render at 69-70fps @540p for DLSS Performance to achieve 60fps @1080p, which is some 14-15% hit compared to native 540p...still way, way better than native 1080p, which wold be ~50% hit, or worse.

The T239 in the Switch 2 supposedly has 64 Ampere Tensor Cores, the same as the RTX 2050 Laptop, so it's basically going to be slow with it's DLSS processing compared to the majority of DLSS capable GPU's. Unfortunately the T239 does not have the Deep Learning Accelerator the T234 had, which could have greatly reduced the overhead of DLSS on Switch 2.

Scaling from 720p to 1080p with DLSS is a 12% performance hit on the RTX 2050 Mobile, I assume 540p to 1080p takes more out of the tensor cores so yeah a 14-15% hit in performance going from 540p to 1080p may be about right.

But really with the low number of tensor cores and the lack of the Deep Learning Accelerator it seems for the most part DLSS is going to be used to scale to 1080p, occasionally 1440p and 4k almost never. The performance hit using DLSS to upscale to 4k on the RTX 2050 Mobile is huge.

Nintendo/Nvidia didn't go all in on DLSS for the Switch 2.

"T239 may share the same GPU architecture as T234 - the Ampere architecture used by Nvidia for its RTX 30-series graphics cards - but everything else is all-new. The deep learning accelerator and ARM Cortex A78AE are gone, while the 2048 CUDA core GPU is slimmed down to 1536 cores."

Still for 1080p gaming it's going to be great, and handheld will especially benefit.

I'm not sure how much DLAs are important for upscaling (I don't think they are), tensor cores are what's important for DLSS - 2 NVDLA units where in T234 cause they had purpose there.

SW2 has 48 tensor cores - 2050 mobile indeed has 64, it's just Richard (DF) running everything at 750Mhz to try to emulate SW2 GPU (it should've been 755MHz, but it's close enough).

Anyway, looking at that 720p Native to 1440p DLSS Performance (which is 720 native), and then doing some math comparing how 1080p vs 1440p DLSS Performance scales from official numbers on Ampere...yeah, again comes out as 14-15% for 1080 DLSS Performance.



HoloDust said:

I'm not sure how much DLAs are important for upscaling (I don't think they are), tensor cores are what's important for DLSS - 2 NVDLA units where in T234 cause they had purpose there.

SW2 has 48 tensor cores - 2050 mobile indeed has 64, it's just Richard (DF) running everything at 750Mhz to try to emulate SW2 GPU (it should've been 755MHz, but it's close enough).

Anyway, looking at that 720p Native to 1440p DLSS Performance (which is 720 native), and then doing some math comparing how 1080p vs 1440p DLSS Performance scales from official numbers on Ampere...yeah, again comes out as 14-15% for 1080 DLSS Performance.

Well DF seem to think the DLA would have helped. They said this in 2023 before they confirmed recently that the T239 doesn't have it.

"DLSS isn't a 'free lunch' and the Tensor cores in the GPU alone can only do so much. However, if T239 includes T234's Deep Learning Accelerator, that could drastically reduce DLSS's overhead." https://www.eurogamer.net/digitalfoundry-2023-inside-nvidias-latest-hardware-for-nintendo-what-is-the-t239-processor

I didn't realise the T239 had cut back on tensor cores as well, I thought it had the same 64 as the T234 but looks like it does only have 48. So yeah DLSS on this is not going to be fast for higher resolutions at all. This might explain why Nintendo is opting for 1440p native on some of their titles and not using any DLSS.



Zippy6 said:
HoloDust said:

I'm not sure how much DLAs are important for upscaling (I don't think they are), tensor cores are what's important for DLSS - 2 NVDLA units where in T234 cause they had purpose there.

SW2 has 48 tensor cores - 2050 mobile indeed has 64, it's just Richard (DF) running everything at 750Mhz to try to emulate SW2 GPU (it should've been 755MHz, but it's close enough).

Anyway, looking at that 720p Native to 1440p DLSS Performance (which is 720 native), and then doing some math comparing how 1080p vs 1440p DLSS Performance scales from official numbers on Ampere...yeah, again comes out as 14-15% for 1080 DLSS Performance.

Well DF seem to think the DLA would have helped. They said this in 2023 before they confirmed recently that the T239 doesn't have it.

"DLSS isn't a 'free lunch' and the Tensor cores in the GPU alone can only do so much. However, if T239 includes T234's Deep Learning Accelerator, that could drastically reduce DLSS's overhead." https://www.eurogamer.net/digitalfoundry-2023-inside-nvidias-latest-hardware-for-nintendo-what-is-the-t239-processor

I didn't realise the T239 had cut back on tensor cores as well, I thought it had the same 64 as the T234 but looks like it does only have 48. So yeah DLSS on this is not going to be fast for higher resolutions at all. This might explain why Nintendo is opting for 1440p native on some of their titles and not using any DLSS.

Honestly, I highly doubt it - NVDLA is for completely different purposes and, and if we're completely honest, Richard is not that tech savvy (as I said somewhere in the thread, they should really consider hiring someone who knows the stuff, preferably with coding skills, to up their game to next level).

Anyway, 2050/3050 don't have NVDLA, nor any of RTX Ampere (or otherwise) consumer based cards - that is reserved for their professional line.