By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo - How Will be Switch 2 Performance Wise?

 

Switch 2 is out! How you classify?

Terribly outdated! 3 5.26%
 
Outdated 1 1.75%
 
Slightly outdated 14 24.56%
 
On point 31 54.39%
 
High tech! 7 12.28%
 
A mixed bag 1 1.75%
 
Total:57
HoloDust said:
bonzobanana said:

I personally don't think the Switch 2 has the CPU resources that many are claiming. An 8 core A78C CPU at 1Ghz is about a passmark of 1900 although from what was claimed the Switch 2 doesn't have the 8MB of cache that a full implementation would have only 4MB so that would reduce performance a bit. The PS4 has around a passmark of 1300 for its eight Jaguar cores. However the Switch 2 uses more of its CPU performance with features like gamechat in fact 2 of its cores are used.  I'm assuming the figures being quoted are the short life bursts to 1.7Ghz that the Switch 2 is capable of but that isn't sustained performance where as the Jaguar cores are at 1.6Ghz each. That is their base clock speed. The PS4 uses one of its cores for the operating system etc which admittedly came later with a revision to the operating system. Originally it was 2 like the Switch 2. The Switch 2 being a portable system will always be trying to reduce power and lowering Mhz where as when docked this isn't such an issue its more thermal management.

I see claims that the Switch 2 is super powerful in CPU terms but I really can't see it myself. The Arm Cortex A78C is an old CPU of the same era as the graphics architecture and the 'C' is mainly enhanced security features which obviously Nintendo would want. How those enhanced security features effect performance I don't know.

There is a CPU element to upscaling as well as GPU and its likely some games will reduce the level of upscaling quality to reduce CPU load. Which maybe what Hogwarts is doing.

Development kits for Switch 2 have been with developers since 2019/2020 incredibly although when Nintendo delayed launching their updated Switch because the Switch 1 was still selling incredibly well I guess they would have abandoned development but they have been perfecting their development on the T239 chipset for a long time. So I don't think Switch 2 is necessarily going to be achieving much greater optimisation over the years. To developers they have had it a very long time it is old technology to them.

I guess Nintendo were morally obliged to stick with T239 to a degree based on all the development work that would already have been done on it.

Just for comparison on a overclocked modded Mariko Switch 1 with the 4 cores operating at 2Ghz you can get about 1200 passmark score where as a stock Switch 1 is around 600.

So they are not that far apart from each other really except for standard CPU frequencies on the original Switch.

I used a much later ARM A78 chip for comparison as couldn't find passmark information for the older A78C but they should be comparable being on the same architecture. I can't imagine the later chip being anymore than 5% faster if that. Obviously you need to adjust the performance to the much lower Ghz of the Switch 2 which is about 1Ghz.

https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A78AE+8+Core+1984+MHz&id=6298

The AMD Jaguar 1.6GHz 8-core CPU used in the PlayStation 4 and Xbox One has a PassMark rating around 1200-1400. It's considered a low-power, efficient processor, but not as powerful as modern CPUs like those in the Ryzen series or the i7-4790K. 

A78AEx8 at Switch 2 CPU clocks gives 493/2735 single/multi-core in Geekbench 6. PS4 does 197/990.

Switch 2 CPU is not that great, compared to what PC/Android handhelds pull off, let alone PS5, but it's quite solid and much, much better than PS4.

Not that it matters much, DLSS is GPU based, not CPU - CPU certainly influences overall performance, just not DLSS part of it.

I didn't want to use Geekbench 6 as that also factors in some GPU functionality I just wanted to compare CPU performance isolated from the GPU to get a fair perspective of the actual CPUs. As for DLSS while the work is primarily done by the GPU a lot of work is done by the CPU. So the CPU performance especially the low CPU performance of the Switch 2 will surely have an impact especially it seems as the older RTX 2050 seems to need more CPU resources in this regard compared to the RTX 4050 so this is better optimised in later cards and the Switch 2 is of the same generation as the RTX 2050 just a much lower power version. Also you can see on PC benchmarks that CPU power has an effect on DLSS frame rates. DLSS performance scales with CPU performance. It doesn't take the burden of upscaling off the CPU as if it did you could have 360p upscaling to 1080p with a much inferior CPU instead but same frame rates. However the reality is it really takes the burden of the GPU more because now it can upscale to 4K with decent frame rates and image quality that it could never produce natively. It enables that GPU to punch well above its normal frame rates for that output resolution. However the burden to the CPU is greater. Of course many PCs have more than enough CPU power so its not an issue but the Switch 2 doesn't its going to be a matter of optimising CPU performance surely a lot of the time.

There was a comment about Hogwarts having poor DLSS upscaling compared to some other titles and this could be related to more limited CPU resources for that game and reducing the quality of DLSS. We will get a much more accurate picture in a few days when people start analysing games on retail hardware. The fact Nintendo have released information about optimising the operating system and trying to get background tasks onto one CPU core rather than two makes me think people are likely to be disappointed in the performance initially and they are trying to limit damage of that by stating like the PS4 it will eventually be better optimised releasing more performance for games but we shall see. I believe if I have remembered rightly FSR 3 etc on AMD chipsets takes far less GPU and CPU resources to upscale but then its much inferior results. XeSS takes a lot of CPU resources but gives much better results. Surely the fact XeSS operates on both AMD and Nvidia chipsets too shows its more CPU bound as well as graphic architecture is less of an issue for that upscaling technology.

Yes, while DLSS (Deep Learning Super Sampling) primarily relies on the GPU's Tensor Cores for its AI-powered upscaling, there's a CPU element involved as well, especially with newer features like DLSS Frame Generation.
Elaboration:
Tensor Cores (GPU):
DLSS's core functionality, which is the AI upscaling and frame generation, is handled by the GPU's Tensor Cores, specifically the RTX 20, RTX 30, RTX 40, RTX 50, and Quadro RTX series.
CPU's Role (Rendering, Pre-processing):
The CPU is responsible for rendering the game world and preparing the data for DLSS.
For DLSS Frame Generation, the CPU needs to efficiently manage the rendering of the initial frames, as the generated frames rely on that base.
A powerful CPU can help reduce the workload on the GPU, potentially leading to better performance, especially when DLSS is enabled.
DLSS and Frame Generation:
DLSS Frame Generation, which is available on RTX 40 and RTX 50 series GPUs, boosts frame rates by using AI to generate new frames.
This requires the CPU to efficiently handle the initial frame rendering and provide the necessary input for the AI.
A CPU bottleneck could potentially limit the benefits of DLSS Frame Generation, as it relies on the base frames generated by the CPU.
Impact of CPU on DLSS:
In some cases, a weak CPU can bottleneck the performance of DLSS, especially in CPU-intensive games or when running higher DLSS settings.
While the GPU handles the AI-powered upscaling and frame generation, the CPU's performance can impact the overall experience.



Around the Network
bonzobanana said:
HoloDust said:

A78AEx8 at Switch 2 CPU clocks gives 493/2735 single/multi-core in Geekbench 6. PS4 does 197/990.

Switch 2 CPU is not that great, compared to what PC/Android handhelds pull off, let alone PS5, but it's quite solid and much, much better than PS4.

Not that it matters much, DLSS is GPU based, not CPU - CPU certainly influences overall performance, just not DLSS part of it.

I didn't want to use Geekbench 6 as that also factors in some GPU functionality I just wanted to compare CPU performance isolated from the GPU to get a fair perspective of the actual CPUs. As for DLSS while the work is primarily done by the GPU a lot of work is done by the CPU. So the CPU performance especially the low CPU performance of the Switch 2 will surely have an impact especially it seems as the older RTX 2050 seems to need more CPU resources in this regard compared to the RTX 4050 so this is better optimised in later cards and the Switch 2 is of the same generation as the RTX 2050 just a much lower power version. Also you can see on PC benchmarks that CPU power has an effect on DLSS frame rates. DLSS performance scales with CPU performance. It doesn't take the burden of upscaling off the CPU as if it did you could have 360p upscaling to 1080p with a much inferior CPU instead but same frame rates. However the reality is it really takes the burden of the GPU more because now it can upscale to 4K with decent frame rates and image quality that it could never produce natively. It enables that GPU to punch well above its normal frame rates for that output resolution. However the burden to the CPU is greater. Of course many PCs have more than enough CPU power so its not an issue but the Switch 2 doesn't its going to be a matter of optimising CPU performance surely a lot of the time.

There was a comment about Hogwarts having poor DLSS upscaling compared to some other titles and this could be related to more limited CPU resources for that game and reducing the quality of DLSS. We will get a much more accurate picture in a few days when people start analysing games on retail hardware. The fact Nintendo have released information about optimising the operating system and trying to get background tasks onto one CPU core rather than two makes me think people are likely to be disappointed in the performance initially and they are trying to limit damage of that by stating like the PS4 it will eventually be better optimised releasing more performance for games but we shall see. I believe if I have remembered rightly FSR 3 etc on AMD chipsets takes far less GPU and CPU resources to upscale but then its much inferior results. XeSS takes a lot of CPU resources but gives much better results. Surely the fact XeSS operates on both AMD and Nvidia chipsets too shows its more CPU bound as well as graphic architecture is less of an issue for that upscaling technology.

Yes, while DLSS (Deep Learning Super Sampling) primarily relies on the GPU's Tensor Cores for its AI-powered upscaling, there's a CPU element involved as well, especially with newer features like DLSS Frame Generation.
Elaboration:
Tensor Cores (GPU):
DLSS's core functionality, which is the AI upscaling and frame generation, is handled by the GPU's Tensor Cores, specifically the RTX 20, RTX 30, RTX 40, RTX 50, and Quadro RTX series.
CPU's Role (Rendering, Pre-processing):
The CPU is responsible for rendering the game world and preparing the data for DLSS.
For DLSS Frame Generation, the CPU needs to efficiently manage the rendering of the initial frames, as the generated frames rely on that base.
A powerful CPU can help reduce the workload on the GPU, potentially leading to better performance, especially when DLSS is enabled.
DLSS and Frame Generation:
DLSS Frame Generation, which is available on RTX 40 and RTX 50 series GPUs, boosts frame rates by using AI to generate new frames.
This requires the CPU to efficiently handle the initial frame rendering and provide the necessary input for the AI.
A CPU bottleneck could potentially limit the benefits of DLSS Frame Generation, as it relies on the base frames generated by the CPU.
Impact of CPU on DLSS:
In some cases, a weak CPU can bottleneck the performance of DLSS, especially in CPU-intensive games or when running higher DLSS settings.
While the GPU handles the AI-powered upscaling and frame generation, the CPU's performance can impact the overall experience.

Those Geekbench results are CPU only.

CPU can affect overall GPU performance and induce bottlenecks - as I said, it has almost nothing to do with DLSS which is GPU dependent.

Switch 2, it seems, will have custom DLSS solution which, from what's been seen so far, has lower precision and shorter accumulation window for temporal data. Maybe not always, and not in all titles, but from Hogwarts it is obvious that its DLSS implementation, at least in some cases, is not as good as standard DLSS (though it doesn't look too good in CP2077 as well, just not as pronounced everywhere).



The Switch 2 doesn't support DLSS FG (unless Nintendo and Nvidia come up with a custom solution for it.) 

The point about having more source frames (which is a CPU affected task) is moot if the Switch 2 isn't going to be doing frame generation anyway. 

And yes, having more frames as inputs can improve temporal stability with upscaling too, but that is a pretty minor aspect of modern DLSS compared to earlier versions.

The most likely reasons for why Hogwarts looks like that is that it is older footage and they haven't yet achieved the 1440p upscale that they are targeting, the SEGA rep is wrong and provided misinformation, or as HoloDust noted the bespoke DLSS method the Switch 2 might be using has its own unique artifacts. Another possibility is that we are seeing DLSS + DRS and the internal resolution is very low at certain moments, which is distinctive from the set modes we see in PC presets. This explains why in some games/scenes we are seeing very nice upscales while in others we are not.

Last edited by sc94597 - on 01 June 2025

HoloDust said:
bonzobanana said:

I didn't want to use Geekbench 6 as that also factors in some GPU functionality I just wanted to compare CPU performance isolated from the GPU to get a fair perspective of the actual CPUs. As for DLSS while the work is primarily done by the GPU a lot of work is done by the CPU. So the CPU performance especially the low CPU performance of the Switch 2 will surely have an impact especially it seems as the older RTX 2050 seems to need more CPU resources in this regard compared to the RTX 4050 so this is better optimised in later cards and the Switch 2 is of the same generation as the RTX 2050 just a much lower power version. Also you can see on PC benchmarks that CPU power has an effect on DLSS frame rates. DLSS performance scales with CPU performance. It doesn't take the burden of upscaling off the CPU as if it did you could have 360p upscaling to 1080p with a much inferior CPU instead but same frame rates. However the reality is it really takes the burden of the GPU more because now it can upscale to 4K with decent frame rates and image quality that it could never produce natively. It enables that GPU to punch well above its normal frame rates for that output resolution. However the burden to the CPU is greater. Of course many PCs have more than enough CPU power so its not an issue but the Switch 2 doesn't its going to be a matter of optimising CPU performance surely a lot of the time.

There was a comment about Hogwarts having poor DLSS upscaling compared to some other titles and this could be related to more limited CPU resources for that game and reducing the quality of DLSS. We will get a much more accurate picture in a few days when people start analysing games on retail hardware. The fact Nintendo have released information about optimising the operating system and trying to get background tasks onto one CPU core rather than two makes me think people are likely to be disappointed in the performance initially and they are trying to limit damage of that by stating like the PS4 it will eventually be better optimised releasing more performance for games but we shall see. I believe if I have remembered rightly FSR 3 etc on AMD chipsets takes far less GPU and CPU resources to upscale but then its much inferior results. XeSS takes a lot of CPU resources but gives much better results. Surely the fact XeSS operates on both AMD and Nvidia chipsets too shows its more CPU bound as well as graphic architecture is less of an issue for that upscaling technology.

Yes, while DLSS (Deep Learning Super Sampling) primarily relies on the GPU's Tensor Cores for its AI-powered upscaling, there's a CPU element involved as well, especially with newer features like DLSS Frame Generation.
Elaboration:
Tensor Cores (GPU):
DLSS's core functionality, which is the AI upscaling and frame generation, is handled by the GPU's Tensor Cores, specifically the RTX 20, RTX 30, RTX 40, RTX 50, and Quadro RTX series.
CPU's Role (Rendering, Pre-processing):
The CPU is responsible for rendering the game world and preparing the data for DLSS.
For DLSS Frame Generation, the CPU needs to efficiently manage the rendering of the initial frames, as the generated frames rely on that base.
A powerful CPU can help reduce the workload on the GPU, potentially leading to better performance, especially when DLSS is enabled.
DLSS and Frame Generation:
DLSS Frame Generation, which is available on RTX 40 and RTX 50 series GPUs, boosts frame rates by using AI to generate new frames.
This requires the CPU to efficiently handle the initial frame rendering and provide the necessary input for the AI.
A CPU bottleneck could potentially limit the benefits of DLSS Frame Generation, as it relies on the base frames generated by the CPU.
Impact of CPU on DLSS:
In some cases, a weak CPU can bottleneck the performance of DLSS, especially in CPU-intensive games or when running higher DLSS settings.
While the GPU handles the AI-powered upscaling and frame generation, the CPU's performance can impact the overall experience.

Those Geekbench results are CPU only.

CPU can affect overall GPU performance and induce bottlenecks - as I said, it has almost nothing to do with DLSS which is GPU dependent.

Switch 2, it seems, will have custom DLSS solution which, from what's been seen so far, has lower precision and shorter accumulation window for temporal data. Maybe not always, and not in all titles, but from Hogwarts it is obvious that its DLSS implementation, at least in some cases, is not as good as standard DLSS (though it doesn't look too good in CP2077 as well, just not as pronounced everywhere).

I disagree with your viewpoint about DLSS being fully GPU dependent as seen lots of evidence that contradicts that. 

In fairness I don't know much about Geekbench 6, I thought it was more browser/office focused etc and included graphic hardware tests related to video editing etc.

There is also the issue regarding the Cortex A78c cores being on a 10/8Nm fabrication process when it was designed and the figures they have given is for the best fabrication process of 5Nm.

Fabricating a new ARM CPU design on an older fabrication process can limit the chip's performance and efficiency. While the design itself may be advanced, the older process may struggle to deliver the dense transistor layouts and tight feature sizes needed to fully realize the design's potential. This can result in a chip that is larger, consumes more power, and may not reach the performance levels of a similar design built on a more modern process.
Here's a more detailed breakdown:
Performance Limits:
Older fabrication processes have larger transistor dimensions and wider spacing between transistors. This means that the overall size of the chip will be larger, and the transistors will take up more space, potentially limiting the number of transistors and thus the overall performance of the chip.
Power Consumption:
A larger chip and older fabrication processes can lead to higher power consumption. This is because transistors in older processes are less efficient and leak more power, and the increased size also means more power is needed to drive the chip.
Cost:
While older processes may be cheaper to use, the limitations on performance and efficiency might not be cost-effective in the long run. A more powerful, efficient chip, even if slightly more expensive to manufacture, might offer better overall value.
Design Tradeoffs:
Designers may have to make compromises in the architecture to work around the limitations of the older process. This could mean using less complex designs, reducing the number of cores, or making other sacrifices that impact performance and efficiency.
Potential for Optimization:
Despite these limitations, there are ways to optimize designs for older processes. For example, designers can focus on power management techniques, optimize the layout of the chip, and use more efficient memory architectures.

Ultimately I feel like the Nintendo Switch 2 is a very budget design with a very dated chipset and a very low cost implementation of that chipset with regards fabrication. It literally is a design and a fabrication process from 2020 and really its only the excellent DLSS upscaling that saves it and makes it more competitive with other platforms but whether we will ever know the true spec and performance is another matter as Nintendo are aggressively protecting the console from hacking/modding with the system quick to brick if anything is out of place like voltages and I'm guessing it is impossible to replace the battery yourself as it will self-brick. All battery replacements will likely have to go through Nintendo. However what will reveal the true spec is its final performance of retail models, yes it will be more guess work that anything but we will get an overall reasonable picture. Any videos about teardowns are probably going to be videos about bricked Switch 2s as well. I just hope there is a youtuber happy to brick his Nintendo 2 to give us full details of the final retail hardware.



bonzobanana said:
HoloDust said:

Those Geekbench results are CPU only.

CPU can affect overall GPU performance and induce bottlenecks - as I said, it has almost nothing to do with DLSS which is GPU dependent.

Switch 2, it seems, will have custom DLSS solution which, from what's been seen so far, has lower precision and shorter accumulation window for temporal data. Maybe not always, and not in all titles, but from Hogwarts it is obvious that its DLSS implementation, at least in some cases, is not as good as standard DLSS (though it doesn't look too good in CP2077 as well, just not as pronounced everywhere).

I disagree with your viewpoint about DLSS being fully GPU dependent as seen lots of evidence that contradicts that.

Then you haven't dug deep enough - start with DLSS_Programming_Guide from nVidia and look up section called DLSS Execution Times and GPU RAM usage.



Around the Network
HoloDust said:
bonzobanana said:

I disagree with your viewpoint about DLSS being fully GPU dependent as seen lots of evidence that contradicts that.

Then you haven't dug deep enough - start with DLSS_Programming_Guide from nVidia and look up section called DLSS Execution Times and GPU RAM usage.

I've already pasted relevant info and seen benchmarks and information from impartial sources based on real world configurations. Nvidia would probably not be a good source of such info. There is a 10% increase in CPU speed I believe for portable mode on Switch 2 over docked mode and portable mode seems to make heavier use of DLSS to upscale from a much lower resolution. Portable mode also has much lower memory bandwidth. It paints a picture of the GPU doing less and the CPU of doing more which is exactly what you would expect for DLSS in that situation. This idea that AI upscaling is not linked to CPU performance is very strange to say the least. CPU's will always be optimal for some code. AMD's FSR 3 and earlier is upscaling with much lower CPU overheads and its basically a bit pants. XeSS is much heavier on the CPU and works far better.  Logic would dictate more CPU performance equals higher quality upscaling?

It just seems so obvious with every source of info backing me up as far as I can tell that we will just have to agree to disagree. I'm basically saying that there is an extra burden on the CPU to upscale from 360p to 1080p lets say even though it takes a huge amount of work off the GPU compared to natively rendering at that higher resolution. I'm also saying there is extra CPU burden beyond that for additional frame generation which again makes it easier for the GPU to have a higher frame count. Obviously on a PC this CPU burden is far less significant than it is on the Switch 2 which has far lower CPU resources.

I'm stating the Switch 2 will likely have to choose DLSS modes on occasion that are less demanding on CPU resources when it needs those CPU resources elsewhere so we may see Switch 2 games where the DLSS is more rough around the edges and has issues more like FSR on occasion as it is paired back to become more manageable for the system.

I don't think I'm saying anything controversial in anyway. However on the positive side the Switch 2 is a fixed platform so textures and other assets can be optimised to upscale well and graphic problems with upscaling can be seen and removed on a fixed platform that is much more difficult on PC with all its variables so I'm expecting overall the Switch 2 to punch above its weight in upscaling overall because it is a fixed platform but obviously it will be limited by being such a low performance platform overall.



bonzobanana said:
HoloDust said:

Then you haven't dug deep enough - start with DLSS_Programming_Guide from nVidia and look up section called DLSS Execution Times and GPU RAM usage.

I've already pasted relevant info and seen benchmarks and information from impartial sources based on real world configurations. Nvidia would probably not be a good source of such info. There is a 10% increase in CPU speed I believe for portable mode on Switch 2 over docked mode and portable mode seems to make heavier use of DLSS to upscale from a much lower resolution. Portable mode also has much lower memory bandwidth. It paints a picture of the GPU doing less and the CPU of doing more which is exactly what you would expect for DLSS in that situation. This idea that AI upscaling is not linked to CPU performance is very strange to say the least. CPU's will always be optimal for some code. AMD's FSR 3 and earlier is upscaling with much lower CPU overheads and its basically a bit pants. XeSS is much heavier on the CPU and works far better.  Logic would dictate more CPU performance equals higher quality upscaling?

It just seems so obvious with every source of info backing me up as far as I can tell that we will just have to agree to disagree. I'm basically saying that there is an extra burden on the CPU to upscale from 360p to 1080p lets say even though it takes a huge amount of work off the GPU compared to natively rendering at that higher resolution. I'm also saying there is extra CPU burden beyond that for additional frame generation which again makes it easier for the GPU to have a higher frame count. Obviously on a PC this CPU burden is far less significant than it is on the Switch 2 which has far lower CPU resources.

I'm stating the Switch 2 will likely have to choose DLSS modes on occasion that are less demanding on CPU resources when it needs those CPU resources elsewhere so we may see Switch 2 games where the DLSS is more rough around the edges and has issues more like FSR on occasion as it is paired back to become more manageable for the system.

I don't think I'm saying anything controversial in anyway. However on the positive side the Switch 2 is a fixed platform so textures and other assets can be optimised to upscale well and graphic problems with upscaling can be seen and removed on a fixed platform that is much more difficult on PC with all its variables so I'm expecting overall the Switch 2 to punch above its weight in upscaling overall because it is a fixed platform but obviously it will be limited by being such a low performance platform overall.

Yes, nVidia would be only relevant source for such info, since it's their official document for actual implementation.

You don't seem to understand how DLSS works - it's GPU dependent (though execution time can vary from engine to engine)

What produces higher CPU usage is not DLSS, but game rendering at lower resolution when DLSS is turned on, pushing more frames for GPU to render. But that is only if you're not frame capped.

Let's say you're running @ native 4K, capped at 60 frames, your GPU usage is at 99%, CPU usage is at some XX amount. Now you turn on DLSS Quality, which will render at 1440p and upscale to 4K. Your GPU usage will drop significantly, your CPU usage will stay the same. Why? Because it still renders 60 frames. Now turn that frame cap off and you will see your GPU again at 99%, resulting in higher than 60fps, with CPU usage increased since it needs to send more frames to GPU. Try this at home, if you have nVidia GPU.

DLSS is GPU dependent (as I said several times over, and as nVidia official documents state), whole rendering pipeline is CPU and GPU dependent. Hope this make it more clear for you.



The only "evidence" shared that DLSS burdens the CPU is what looks to be a quote that describes how the CPU can bottleneck DLSS (especially frame generation) workloads. 

A CPU working on a separate part of the pipeline acting as a bottleneck on another part of the pipeline is not the same thing as that part of the pipeline using the CPU for compute. 

DLSS upscalers are relatively small CNN or transformer models. The only part the CPU takes is loading the model from the driver located in system storage into the GPU's VRAM.

That likely happens in milli-seconds after you change your game settings. And it only happens as often as you change the settings. All actual inference steps are done on the GPU. 

What the quote shared is saying is that like with any other setting that increases framerate (like say if you decrease the resolution to 720p without even using upscaling) there is a higher burden on the CPU from rendering more frames and the CPU can be the bottleneck, limiting GPU utilization and affecting the overall ability to attain higher framerates.

Additionally, there is the issue that DLSS FG works best when input framerates are higher, so there is an incentive to have your input framerate be relatively high, which also burdens the CPU more than otherwise (like if you locked to 30fps and didn't use FG.) 

Now it is possible to have a CNN/transformer model utilize both the GPU and CPU, but the only time you would want to do that is when memory is split between the CPU and GPU, the model is too big to load on the GPU and you don't care about the order of magnitude increase in latency. None of this applies to DLSS or the Switch 2.

DLSS models (especially CNN) are relatively small, the Switch 2 has unified memory. 



HoloDust said:
bonzobanana said:

I've already pasted relevant info and seen benchmarks and information from impartial sources based on real world configurations. Nvidia would probably not be a good source of such info. There is a 10% increase in CPU speed I believe for portable mode on Switch 2 over docked mode and portable mode seems to make heavier use of DLSS to upscale from a much lower resolution. Portable mode also has much lower memory bandwidth. It paints a picture of the GPU doing less and the CPU of doing more which is exactly what you would expect for DLSS in that situation. This idea that AI upscaling is not linked to CPU performance is very strange to say the least. CPU's will always be optimal for some code. AMD's FSR 3 and earlier is upscaling with much lower CPU overheads and its basically a bit pants. XeSS is much heavier on the CPU and works far better.  Logic would dictate more CPU performance equals higher quality upscaling?

It just seems so obvious with every source of info backing me up as far as I can tell that we will just have to agree to disagree. I'm basically saying that there is an extra burden on the CPU to upscale from 360p to 1080p lets say even though it takes a huge amount of work off the GPU compared to natively rendering at that higher resolution. I'm also saying there is extra CPU burden beyond that for additional frame generation which again makes it easier for the GPU to have a higher frame count. Obviously on a PC this CPU burden is far less significant than it is on the Switch 2 which has far lower CPU resources.

I'm stating the Switch 2 will likely have to choose DLSS modes on occasion that are less demanding on CPU resources when it needs those CPU resources elsewhere so we may see Switch 2 games where the DLSS is more rough around the edges and has issues more like FSR on occasion as it is paired back to become more manageable for the system.

I don't think I'm saying anything controversial in anyway. However on the positive side the Switch 2 is a fixed platform so textures and other assets can be optimised to upscale well and graphic problems with upscaling can be seen and removed on a fixed platform that is much more difficult on PC with all its variables so I'm expecting overall the Switch 2 to punch above its weight in upscaling overall because it is a fixed platform but obviously it will be limited by being such a low performance platform overall.

Yes, nVidia would be only relevant source for such info, since it's their official document for actual implementation.

You don't seem to understand how DLSS works - it's GPU dependent (though execution time can vary from engine to engine)

What produces higher CPU usage is not DLSS, but game rendering at lower resolution when DLSS is turned on, pushing more frames for GPU to render. But that is only if you're not frame capped.

Let's say you're running @ native 4K, capped at 60 frames, your GPU usage is at 99%, CPU usage is at some XX amount. Now you turn on DLSS Quality, which will render at 1440p and upscale to 4K. Your GPU usage will drop significantly, your CPU usage will stay the same. Why? Because it still renders 60 frames. Now turn that frame cap off and you will see your GPU again at 99%, resulting in higher than 60fps, with CPU usage increased since it needs to send more frames to GPU. Try this at home, if you have nVidia GPU.

DLSS is GPU dependent (as I said several times over, and as nVidia official documents state), whole rendering pipeline is CPU and GPU dependent. Hope this make it more clear for you.

Nvidia are hardly the correct source for such info. We have just had the CEO claiming the Switch 2 is the most powerful portable gaming device with regards graphics which is a ridiculous marketing statement. Real world data is all that matters. You compare the CPU requirements for upscaling from 720p to lets say 1080p or 1440p with rending at the lower resolution and not upscaling to see the CPU difference and how it effects frame rates. Obviously if you compare with rendering at 4K natively for example which has many additional CPU overheads too then you can make the case it has less CPU requirements. DLSS is not a free lunch it has much higher CPU requirements over rendering at the original resolution and not upscaling. It's a sliding scale too on the more AI processing you do the more CPU requirements. This is common knowledge nothing controversial or debatable surely? The point is with the Switch 2 based on such a dated 10/8Nm fabrication process it is not going to meet the Cortex A78C performance figures which are based on the optimal 5Nm fabrication process. Each time ARM updates their chips they give performance data for the optimal fabrication process. So when one new ARM chip has a higher performance level a lot of that is the improved fabrication process i.e. going from 7Nm to 5Nm. It seems many people are quoting performance figures for the optimal 5Nm process which the Switch 2 has no chance of meeting at all as it will be a cut down version on 10Nm mainly. So we are looking at a more CPU restricted console with greater CPU overheads for DLSS.

In the coming days I'm sure all will be revealed and we will get a lot more evidence on the real performance of the Switch 2. I've just been watching a video where they have been stating how much the Switch 2 version of Cyberpunk has been cutdown to keep the frame rate up with the reduction in onscreen background NPCs etc. 

I was reading yesterday that the T239 doesn't double Teraflop performance when in FP16 mode so that 2.3 Teraflops approx is still 2.3 Teraflops in FP16 (1:1) and many game engines are optimised for FP16 nowadays to improve performance. The PS4 Pro for example (from memory) is about 4.2 Teraflops in FP32 but 8.4 Teraflops in FP16. I have a Radeon RX 6500 XT card which is about 9 Teraflops FP32 but 18 Teraflops FP16 and that card cost me about 65 pounds. I'm not 100% sure about this because the RTX 2050 does double Teraflops performance with FP16 even the mobile version and is based on similar architecture. So I'm not sure if this is correct. However it seems to be based on the T239 having single Tensor cores for mixed precision not double Tensor cores. You need double Tensor cores for doubling Teraflops at FP16 it seems.

Going back to the ARM cores. This was a news release about the T239;

For the Nintendo Switch 2, NVIDIA is said to utilize a customized variant of NVIDIA Jetson Orin SoC for automotive applications. The reference Orin SoC carries a codename T234, while this alleged adaptation has a T239 codename; the version is most likely optimized for power efficiency. The reference Orin design is a considerable uplift compared to the Tegra X1, as it boasts 12 Cortex-A78AE cores and LPDDR5 memory, along with Ampere GPU microarchitecture. Built on Samsung's 8 nm node, the efficiency would likely yield better battery life and position the second-generation Switch well among the now extended handheld gaming console market. However, including Ampere architecture would also bring technologies like DLSS, which would benefit the low-power SoC.

This seems to back up the single Tensor cores and also use of an inferior ARM A78AE CPU cores re-designed to work on Samsung's 10/8Nm fabrication process. This ARM A78C seems like manipulative marketing. Yes you can make the case they are similar but ARM A78C's are designed for a more modern fabrication process and would Nintendo really have paid up for a complete re-design of the ARM A78C to work on such a dated fabrication process cutting back many features to work on 10/8Nm when there are already ARM A78AE cores? What we do know is Geekwan showed the CPU of Switch 2 had 4MB of cache same as ARM A78AE not the 8MB of the ARM A78C? So just by looking at the PCB we know its likely to be the original ARM A78AE cores. The ARM A78AE cores produce about 7-7.5 DMips of performance per Mhz. So half the passmark given below for single and multi results due to the 1Ghz speed as previously stated. The ARM A78AE seems to be a much more power efficient design compared to A78C but lower performance too. My point is surely on such a system you have to factor in the greater demands of DLSS when upscaling and the very low CPU resources of the console. It isn't that much more powerful than the Jaguar cores of the PS4 in fact its about the same as the faster Jaguar cores of the PS4 Pro overall. 

https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A78AE+8+Core+1984+MHz&id=6298

My point is DLSS upscaling requires more CPU resources compared to native rendering at the lower resolution especially if using the highest quality settings and frame generation. The CPU is much, much weaker than many people are stating. The operating system is devoting lots of resources to gamechat and other features. 

It's a fantastic portable console just not that powerful overall with huge exaggerated claims about performance.

 



bonzobanana said:
HoloDust said:

Yes, nVidia would be only relevant source for such info, since it's their official document for actual implementation.

You don't seem to understand how DLSS works - it's GPU dependent (though execution time can vary from engine to engine)

What produces higher CPU usage is not DLSS, but game rendering at lower resolution when DLSS is turned on, pushing more frames for GPU to render. But that is only if you're not frame capped.

Let's say you're running @ native 4K, capped at 60 frames, your GPU usage is at 99%, CPU usage is at some XX amount. Now you turn on DLSS Quality, which will render at 1440p and upscale to 4K. Your GPU usage will drop significantly, your CPU usage will stay the same. Why? Because it still renders 60 frames. Now turn that frame cap off and you will see your GPU again at 99%, resulting in higher than 60fps, with CPU usage increased since it needs to send more frames to GPU. Try this at home, if you have nVidia GPU.

DLSS is GPU dependent (as I said several times over, and as nVidia official documents state), whole rendering pipeline is CPU and GPU dependent. Hope this make it more clear for you.

Nvidia are hardly the correct source for such info. We have just had the CEO claiming the Switch 2 is the most powerful portable gaming device with regards graphics which is a ridiculous marketing statement. Real world data is all that matters. You compare the CPU requirements for upscaling from 720p to lets say 1080p or 1440p with rending at the lower resolution and not upscaling to see the CPU difference and how it effects frame rates. Obviously if you compare with rendering at 4K natively for example which has many additional CPU overheads too then you can make the case it has less CPU requirements. DLSS is not a free lunch it has much higher CPU requirements over rendering at the original resolution and not upscaling. It's a sliding scale too on the more AI processing you do the more CPU requirements. This is common knowledge nothing controversial or debatable surely? The point is with the Switch 2 based on such a dated 10/8Nm fabrication process it is not going to meet the Cortex A78C performance figures which are based on the optimal 5Nm fabrication process. Each time ARM updates their chips they give performance data for the optimal fabrication process. So when one new ARM chip has a higher performance level a lot of that is the improved fabrication process i.e. going from 7Nm to 5Nm. It seems many people are quoting performance figures for the optimal 5Nm process which the Switch 2 has no chance of meeting at all as it will be a cut down version on 10Nm mainly. So we are looking at a more CPU restricted console with greater CPU overheads for DLSS.

In the coming days I'm sure all will be revealed and we will get a lot more evidence on the real performance of the Switch 2. I've just been watching a video where they have been stating how much the Switch 2 version of Cyberpunk has been cutdown to keep the frame rate up with the reduction in onscreen background NPCs etc. 

I was reading yesterday that the T239 doesn't double Teraflop performance when in FP16 mode so that 2.3 Teraflops approx is still 2.3 Teraflops in FP16 (1:1) and many game engines are optimised for FP16 nowadays to improve performance. The PS4 Pro for example (from memory) is about 4.2 Teraflops in FP32 but 8.4 Teraflops in FP16. I have a Radeon RX 6500 XT card which is about 9 Teraflops FP32 but 18 Teraflops FP16 and that card cost me about 65 pounds. I'm not 100% sure about this because the RTX 2050 does double Teraflops performance with FP16 even the mobile version and is based on similar architecture. So I'm not sure if this is correct. However it seems to be based on the T239 having single Tensor cores for mixed precision not double Tensor cores. You need double Tensor cores for doubling Teraflops at FP16 it seems.

Going back to the ARM cores. This was a news release about the T239;

For the Nintendo Switch 2, NVIDIA is said to utilize a customized variant of NVIDIA Jetson Orin SoC for automotive applications. The reference Orin SoC carries a codename T234, while this alleged adaptation has a T239 codename; the version is most likely optimized for power efficiency. The reference Orin design is a considerable uplift compared to the Tegra X1, as it boasts 12 Cortex-A78AE cores and LPDDR5 memory, along with Ampere GPU microarchitecture. Built on Samsung's 8 nm node, the efficiency would likely yield better battery life and position the second-generation Switch well among the now extended handheld gaming console market. However, including Ampere architecture would also bring technologies like DLSS, which would benefit the low-power SoC.

This seems to back up the single Tensor cores and also use of an inferior ARM A78AE CPU cores re-designed to work on Samsung's 10/8Nm fabrication process. This ARM A78C seems like manipulative marketing. Yes you can make the case they are similar but ARM A78C's are designed for a more modern fabrication process and would Nintendo really have paid up for a complete re-design of the ARM A78C to work on such a dated fabrication process cutting back many features to work on 10/8Nm when there are already ARM A78AE cores? What we do know is Geekwan showed the CPU of Switch 2 had 4MB of cache same as ARM A78AE not the 8MB of the ARM A78C? So just by looking at the PCB we know its likely to be the original ARM A78AE cores. The ARM A78AE cores produce about 7-7.5 DMips of performance per Mhz. So half the passmark given below for single and multi results due to the 1Ghz speed as previously stated. The ARM A78AE seems to be a much more power efficient design compared to A78C but lower performance too. My point is surely on such a system you have to factor in the greater demands of DLSS when upscaling and the very low CPU resources of the console. It isn't that much more powerful than the Jaguar cores of the PS4 in fact its about the same as the faster Jaguar cores of the PS4 Pro overall. 

https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A78AE+8+Core+1984+MHz&id=6298

My point is DLSS upscaling requires more CPU resources compared to native rendering at the lower resolution especially if using the highest quality settings and frame generation. The CPU is much, much weaker than many people are stating. The operating system is devoting lots of resources to gamechat and other features. 

It's a fantastic portable console just not that powerful overall with huge exaggerated claims about performance.

 

Sorry mate, you're free to think whatever you like, but you obviously don't understand difference between what DLSS upscaling does and what running a game at higher frame rate does, DLSS or not. So, not gonna waste any more time on this, and have a nice day.