By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo - How Will be Switch 2 Performance Wise?

 

Switch 2 is out! How you classify?

Terribly outdated! 3 5.26%
 
Outdated 1 1.75%
 
Slightly outdated 14 24.56%
 
On point 31 54.39%
 
High tech! 7 12.28%
 
A mixed bag 1 1.75%
 
Total:57
bonzobanana said:

Real world data is all that matters. You compare the CPU requirements for upscaling from 720p to lets say 1080p or 1440p with rending at the lower resolution and not upscaling to see the CPU difference and how it effects frame rates.

Where is this "real world data"? Where are the benchmarks showing higher CPU utilization for say DLSS 720p -> 1080p than say bilinear 720p -> 1080p?  So far you haven't provided it. DLSS upscaling does have a performance penalty (usually very minor for high-end GPU's) compared to a brute-force upscale method, but that isn't because the CPU is burdened. It's because the tensor cores are being utilized. 

And no, we don't need to just trust Nvidia on this. Convolution Neural Networks (and Vision Transformers - ViT) inference isn't some hidden knowledge. It's well known that after the pre-processing steps (i.e loading the model into VRAM, which happens once and not per frame) the GPU takes over the inference.

In the case of DLSS -- what that looks like is the CUDA cores calculate the motion vectors, color data, and other standard rendering passes (performance is saved because the internal pixel count is lower than would otherwise be necessary) -> the tensor cores, using a pre-trained CNN (or ViT) model, take the rendered frame-data and perform the various convolution, pooling, relu passes (which are just various types of matrix multiplications) -> CUDA cores probably do a bit more post-inference work -> image is sent to the display. 

Where in this pipeline is the CPU doing anything? 



Around the Network
bonzobanana said:

This seems to back up the single Tensor cores and also use of an inferior ARM A78AE CPU cores re-designed to work on Samsung's 10/8Nm fabrication process. This ARM A78C seems like manipulative marketing. Yes you can make the case they are similar but ARM A78C's are designed for a more modern fabrication process and would Nintendo really have paid up for a complete re-design of the ARM A78C to work on such a dated fabrication process cutting back many features to work on 10/8Nm when there are already ARM A78AE cores? What we do know is Geekwan showed the CPU of Switch 2 had 4MB of cache same as ARM A78AE not the 8MB of the ARM A78C? So just by looking at the PCB we know its likely to be the original ARM A78AE cores. The ARM A78AE cores produce about 7-7.5 DMips of performance per Mhz. So half the passmark given below for single and multi results due to the 1Ghz speed as previously stated. The ARM A78AE seems to be a much more power efficient design compared to A78C but lower performance too. My point is surely on such a system you have to factor in the greater demands of DLSS when upscaling and the very low CPU resources of the console. It isn't that much more powerful than the Jaguar cores of the PS4 in fact its about the same as the faster Jaguar cores of the PS4 Pro overall. 

https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A78AE+8+Core+1984+MHz&id=6298

My point is DLSS upscaling requires more CPU resources compared to native rendering at the lower resolution especially if using the highest quality settings and frame generation. The CPU is much, much weaker than many people are stating. The operating system is devoting lots of resources to gamechat and other features. 

It's a fantastic portable console just not that powerful overall with huge exaggerated claims about performance.

Replying to this in a separate post because it is a different topic. 

Even if we take your premise as true (that the Switch 2 is using the same AE cores that exist in Orin) we are still seeing about 2.8 times the performance of the PS4's CPU in multi-core and 2.5 times in single-core, using Geekerwan's simulated test where he clocked the 8 AE cores (also on an 8 nm node, btw) in an Orin NX at 1.1 Ghz and 1Ghz and ran Geekbench 6 (which no, is not measuring GPU at all. Geekbench's GPU test is a separate benchmark from its CPU test.

But in reality the Switch 2 does have an A78C with 8-cores on single cluster (confirmed by the leaked SDK.) This gives it an advantage in multi-core over the 2 Cluster x 4 Core 78AE's in the Orin NX. So the benchmark above is probably a slight (5-10%) under-estimate for multi-core performance. . 

You can also see that the A78AE and A78C are different cores, even if the exact same size. So it's not just a binned T234. 

Like is typical with consoles, cache probably was "reduced" (remember 8MB L3 is the maximum ("up to') for A78C, not the full range it can come with, which is Optional 512KB - 8MB) to make space on the die. 

By the way, ARM designs their cores to work on multiple different fabs. You can see this in this image from the Geekerwan video. 

The Dimensity 8200 has 4 A78 (no-suffix) cores and is on TSMC 4nm while the Snapdragon 888 has 3 A78 cores and is on Samsung 5nm. 

In the image above we see SF 3nm-5nm, TSMC 4nm-6nm all using the same A78 (no-suffix) cores. 

A78AE itself comes in a Samsung 5nm variant (Auto v920) in certain Audi cars and a Samsung 8nm in the Jetson Orin devices. 

Video from Kurnal where you can find a lot of this information.

13:07 is where the core-cluster ratio is discussed.

Last edited by sc94597 - on 04 June 2025

sc94597 said:
bonzobanana said:

This seems to back up the single Tensor cores and also use of an inferior ARM A78AE CPU cores re-designed to work on Samsung's 10/8Nm fabrication process. This ARM A78C seems like manipulative marketing. Yes you can make the case they are similar but ARM A78C's are designed for a more modern fabrication process and would Nintendo really have paid up for a complete re-design of the ARM A78C to work on such a dated fabrication process cutting back many features to work on 10/8Nm when there are already ARM A78AE cores? What we do know is Geekwan showed the CPU of Switch 2 had 4MB of cache same as ARM A78AE not the 8MB of the ARM A78C? So just by looking at the PCB we know its likely to be the original ARM A78AE cores. The ARM A78AE cores produce about 7-7.5 DMips of performance per Mhz. So half the passmark given below for single and multi results due to the 1Ghz speed as previously stated. The ARM A78AE seems to be a much more power efficient design compared to A78C but lower performance too. My point is surely on such a system you have to factor in the greater demands of DLSS when upscaling and the very low CPU resources of the console. It isn't that much more powerful than the Jaguar cores of the PS4 in fact its about the same as the faster Jaguar cores of the PS4 Pro overall. 

https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A78AE+8+Core+1984+MHz&id=6298

My point is DLSS upscaling requires more CPU resources compared to native rendering at the lower resolution especially if using the highest quality settings and frame generation. The CPU is much, much weaker than many people are stating. The operating system is devoting lots of resources to gamechat and other features. 

It's a fantastic portable console just not that powerful overall with huge exaggerated claims about performance.

Replying to this in a separate post because it is a different topic. 

Even if we take your premise as true (that the Switch 2 is using the same AE cores that exist in Orin) we are still seeing about 2.8 times the performance of the PS4's CPU in multi-core and 2.5 times in single-core, using Geekerwan's simulated test where he clocked the 8 AE cores (also on an 8 nm node, btw) in an Orin NX at 1.1 Ghz and 1Ghz and ran Geekbench 6 (which no, is not measuring GPU at all. Geekbench's GPU test is a separate benchmark from its CPU test.

But in reality the Switch 2 does have an A78C with 8-cores on single cluster (confirmed by the leaked SDK.) This gives it an advantage in multi-core over the 2 Cluster x 4 Core 78AE's in the Orin NX. So the benchmark above is probably a slight (5-10%) under-estimate for multi-core performance. . 

You can also see that the A78AE and A78C are different cores, even if the exact same size. So it's not just a binned T234. 

Like is typical with consoles, cache probably was "reduced" (remember 8MB L3 is the maximum ("up to') for A78C, not the full range it can come with, which is Optional 512KB - 8MB) to make space on the die. 

By the way, ARM designs their cores to work on multiple different fabs. You can see this in this image from the Geekerwan video. 

The Dimensity 8200 has 4 A78 (no-suffix) cores and is on TSMC 4nm while the Snapdragon 888 has 3 A78 cores and is on Samsung 5nm. 

In the image above we see SF 3nm-5nm, TSMC 4nm-6nm all using the same A78 (no-suffix) cores. 

A78AE itself comes in a Samsung 5nm variant (Auto v920) in certain Audi cars and a Samsung 8nm in the Jetson Orin devices. 

Video from Kurnal where you can find a lot of this information.

13:07 is where the core-cluster ratio is discussed.

Of course ARM cores are used on different fabrication processes my point was the performance figures are given for the best fabrication process and to use the same A78C cores you have to have a different design often downgraded to work on older fabrication processes. The roots of the Switch 2 T239 with its tensor core design and reduced cache are of the same age and generation which goes back to final silicon in 2020/21. You can't just inject later tech into the design and its unfair to compare same or similar CPUs on different fabrication processes. That shows you how these CPU's have to be heavily redesigned to work on different fabrication processes and often feature limited on older fabrication processes.

We are seeing the Switch 2 natively render at very low resolutions and then upscaling with DLSS to make its performance competitive. If it really had the spec you claimed why does it need to render at 360p at times often below that of the original Switch lowest rendering resolution. The Switch 2 only has a 20Wh battery it cannot afford the performance you believe and there are no indications it can do that anyway. It is DLSS that is saving the day by rendering at such low resolutions. 

You have to get back to reality a 10/8Nm fabrication process, a very low capacity battery, only being allowed a peak wattage for the SOC of 4-6W per hour to give the 10W maximum allowing for the screen that is going to be around 4-6W on its own. The Switch 2 is based around a very low performance power efficient Nvidia chipset from 2020/2021 on a fabrication process of that time. You are throwing loads of data in your replies most of which isn't relevant to the Switch 2 and its fabrication process and age of design.

The Switch 2 has about 3-4x the CPU performance of the original Switch a huge upgrade. It has a GPU capable of 5-6x the docked performance in graphic teraflops and 3x the memory. It's a huge generational leap but its still having to render more ambitious games at resolutions as low as 360p. You need to get real rather than for every bit of Switch 2 spec assume the very best possible performance. It's a low cost low performance design from 2021 given a boost with more system memory and storage plus of course DLSS upscaling technology. 

I'm sure as time goes on we will get a much better analysis of retail Switch 2's and what they are capable of. Today is only the launch day, in the coming weeks and months we will get a lot more information why the Switch 2 is limited to rendering at such low resolutions and relying on upscaling so much.



bonzobanana said:

Of course ARM cores are used on different fabrication processes my point was the performance figures are given for the best fabrication process and to use the same A78C cores you have to have a different design often downgraded to work on older fabrication processes. The roots of the Switch 2 T239 with its tensor core design and reduced cache are of the same age and generation which goes back to final silicon in 2020/21. You can't just inject later tech into the design and its unfair to compare same or similar CPUs on different fabrication processes. That shows you how these CPU's have to be heavily redesigned to work on different fabrication processes and often feature limited on older fabrication processes.

Are you just pulling things out of the aether now? You don't need a "different design" that is "often downgraded to work on older fabrication processes." What do you even mean by "downgraded" here? What is downgraded? What performance figures are you even talking about? All process node tells us is how much power a given design will use at a set of frequencies, and the dimensions of the physical transistors in the chip. The actual design of an A78 core is the same design whether it is on 3nm Samsung or 5nm Samsung. Likewise with an A78AE or an A78C which are special-case iterations of the A78. And yes, while a core's architecture isn't infinitely compatible with all process nodes, it doesn't mean ARM is targeting a specific fab process when they design a core. Usually they are targeting a range of fabs for which their core's design is compatible. 

All of this is moot, because we are talking about actual performance in the real-world on an 8nm chip. The Geekerwan benchmark isn't some hypothetical figure given to us by ARM ,Samsung, Nvidia, or Nintendo. It is an actual benchmark on physical hardware of A78AE cores (also on an 8nm node) clocked at 1.1Ghz and 1 Ghz for handheld and docked mode respectively. 

We are seeing the Switch 2 natively render at very low resolutions and then upscaling with DLSS to make its performance competitive. If it really had the spec you claimed why does it need to render at 360p at times often below that of the original Switch lowest rendering resolution. The Switch 2 only has a 20Wh battery it cannot afford the performance you believe and there are no indications it can do that anyway. It is DLSS that is saving the day by rendering at such low resolutions. 

Which is typically something you can only do when the workload is GPU-bound. Dropping the resolution wouldn't do anything if the CPU were the issue and we are seeing a CPU-bottleneck or an over-utilized CPU. You keep bringing up the fact that the Switch 2 only has a 20Wh battery. Okay, most platforms that have A78C cores have even smaller power-budgets than the Switch 2 because most of these platforms are either smartphones that can't run at 5-7W because they'll get too hot without active cooling or they are Thinkpad laptops that need to last 10-15 hours when the efficiency cores are being utilized and also only have a 5-7W power budget when running off charge, even if their batteries are larger. 

You have to get back to reality a 10/8Nm fabrication process, a very low capacity battery, only being allowed a peak wattage for the SOC of 4-6W per hour to give the 10W maximum allowing for the screen that is going to be around 4-6W on its own. The Switch 2 is based around a very low performance power efficient Nvidia chipset from 2020/2021 on a fabrication process of that time. You are throwing loads of data in your replies most of which isn't relevant to the Switch 2 and its fabrication process and age of design.

I am in reality. We literally see a benchmark where an A78AE (on the same exact node as the T239) clocked at 1 GHz is performing as well you are saying is not possible. The Geekbench 6 numbers aren't a lie. You have still not addressed them. You tried to pivot before when they were brought up suggesting that it included a GPU benchmark. 

The Switch 2 has about 3-4x the CPU performance of the original Switch a huge upgrade. It has a GPU capable of 5-6x the docked performance in graphic teraflops and 3x the memory. It's a huge generational leap but its still having to render more ambitious games at resolutions as low as 360p. You need to get real rather than for every bit of Switch 2 spec assume the very best possible performance. It's a low cost low performance design from 2021 given a boost with more system memory and storage plus of course DLSS upscaling technology. 

Again, this already has been mentioned to you. Running a game at 360p (ultra-performance DLSS) and upscaling it to 720p is not the same burden as running a game at 360p and doing nothing. The tensor cores and CUDA cores (not the CPU) have to do work to upscale that image. The graphics settings can also be scaled higher if the image is clean enough from 360p -> 720p. It is also important to identify that 360p is a minimum internal resolution when combined with dynamic resolution scaling and isn't the average internal resolution of the game, which is more like 500p-600p depending on the specific game we're talking about. This is also not atypical for modern handheld platforms. To get games in playable states on the Steam Deck you often have to have the internal resolution in the 300-500p range and upscale with FSR. Cyberpunk, for example, only runs at 40fps at low graphics settings if you use FSR 2.1 Balanced at 720p, which is about 500,000 pixels or a sub-540p resolution. 

I'm sure as time goes on we will get a much better analysis of retail Switch 2's and what they are capable of. Today is only the launch day, in the coming weeks and months we will get a lot more information why the Switch 2 is limited to rendering at such low resolutions and relying on upscaling so much.

It's "relying on upscaling so much" because it is a modern feature-set that it would be dumb not to use. You can push your GPU harder (by dialing up or keeping setting high) and not get too much loss in graphics or image quality while maintaining your resolution target when using DLSS. Why wouldn't every game use it and use it heavily?  

Last edited by sc94597 - on 05 June 2025

Content removed.



Around the Network

So it's 3TF in dock mode and 1,72 Tf in handheld mode



Switch 2 Handheld Quality Mode

Switch 2 Docked Quality Mode

Rog Ally Z1E 15W XESS Performance 1080p Steam Deck Settings

Last edited by sc94597 - on 10 June 2025

XeSS is surprisingly quite a bit better than FSR (pre FSR 4), folks with older AMD GPUs should really think about using it (though it doesn't quite compare to DLSS at same quality settings).




Phenomenal bs. Switch 2 is much more powerful than steam deck in portable.



Zippy6 said:
Oneeee-Chan!!!2.0 said:

I was just thinking after seeing the cyberpunk 2077 performance on the Switch 2, it's odd that the $449 Switch 2 is better than the $500 Steam Deck 256 GB , yet is criticized as being too expensive.

Steam Deck 256gb LCD is $399 not $500. "Better" is very subjective. In games performance very likely more often than not, but the Steam Deck has access to the largest gaming library in existence and has a ton of features and uses that the Switch 2 does not.

I don't think $449 is too much for the Switch 2 but I also don't think it's really better value than a steam deck. They are very different products. One is a full PC in your hands.

What do you do on Deck except gaming? I dont understand. Its gaming platform as consoles. Usual desktop pc is not gaming platform at all.