HoloDust said:
bonzobanana said:
I've already pasted relevant info and seen benchmarks and information from impartial sources based on real world configurations. Nvidia would probably not be a good source of such info. There is a 10% increase in CPU speed I believe for portable mode on Switch 2 over docked mode and portable mode seems to make heavier use of DLSS to upscale from a much lower resolution. Portable mode also has much lower memory bandwidth. It paints a picture of the GPU doing less and the CPU of doing more which is exactly what you would expect for DLSS in that situation. This idea that AI upscaling is not linked to CPU performance is very strange to say the least. CPU's will always be optimal for some code. AMD's FSR 3 and earlier is upscaling with much lower CPU overheads and its basically a bit pants. XeSS is much heavier on the CPU and works far better. Logic would dictate more CPU performance equals higher quality upscaling? It just seems so obvious with every source of info backing me up as far as I can tell that we will just have to agree to disagree. I'm basically saying that there is an extra burden on the CPU to upscale from 360p to 1080p lets say even though it takes a huge amount of work off the GPU compared to natively rendering at that higher resolution. I'm also saying there is extra CPU burden beyond that for additional frame generation which again makes it easier for the GPU to have a higher frame count. Obviously on a PC this CPU burden is far less significant than it is on the Switch 2 which has far lower CPU resources. I'm stating the Switch 2 will likely have to choose DLSS modes on occasion that are less demanding on CPU resources when it needs those CPU resources elsewhere so we may see Switch 2 games where the DLSS is more rough around the edges and has issues more like FSR on occasion as it is paired back to become more manageable for the system. I don't think I'm saying anything controversial in anyway. However on the positive side the Switch 2 is a fixed platform so textures and other assets can be optimised to upscale well and graphic problems with upscaling can be seen and removed on a fixed platform that is much more difficult on PC with all its variables so I'm expecting overall the Switch 2 to punch above its weight in upscaling overall because it is a fixed platform but obviously it will be limited by being such a low performance platform overall. |
Yes, nVidia would be only relevant source for such info, since it's their official document for actual implementation. You don't seem to understand how DLSS works - it's GPU dependent (though execution time can vary from engine to engine) What produces higher CPU usage is not DLSS, but game rendering at lower resolution when DLSS is turned on, pushing more frames for GPU to render. But that is only if you're not frame capped. Let's say you're running @ native 4K, capped at 60 frames, your GPU usage is at 99%, CPU usage is at some XX amount. Now you turn on DLSS Quality, which will render at 1440p and upscale to 4K. Your GPU usage will drop significantly, your CPU usage will stay the same. Why? Because it still renders 60 frames. Now turn that frame cap off and you will see your GPU again at 99%, resulting in higher than 60fps, with CPU usage increased since it needs to send more frames to GPU. Try this at home, if you have nVidia GPU. DLSS is GPU dependent (as I said several times over, and as nVidia official documents state), whole rendering pipeline is CPU and GPU dependent. Hope this make it more clear for you. |
Nvidia are hardly the correct source for such info. We have just had the CEO claiming the Switch 2 is the most powerful portable gaming device with regards graphics which is a ridiculous marketing statement. Real world data is all that matters. You compare the CPU requirements for upscaling from 720p to lets say 1080p or 1440p with rending at the lower resolution and not upscaling to see the CPU difference and how it effects frame rates. Obviously if you compare with rendering at 4K natively for example which has many additional CPU overheads too then you can make the case it has less CPU requirements. DLSS is not a free lunch it has much higher CPU requirements over rendering at the original resolution and not upscaling. It's a sliding scale too on the more AI processing you do the more CPU requirements. This is common knowledge nothing controversial or debatable surely? The point is with the Switch 2 based on such a dated 10/8Nm fabrication process it is not going to meet the Cortex A78C performance figures which are based on the optimal 5Nm fabrication process. Each time ARM updates their chips they give performance data for the optimal fabrication process. So when one new ARM chip has a higher performance level a lot of that is the improved fabrication process i.e. going from 7Nm to 5Nm. It seems many people are quoting performance figures for the optimal 5Nm process which the Switch 2 has no chance of meeting at all as it will be a cut down version on 10Nm mainly. So we are looking at a more CPU restricted console with greater CPU overheads for DLSS.
In the coming days I'm sure all will be revealed and we will get a lot more evidence on the real performance of the Switch 2. I've just been watching a video where they have been stating how much the Switch 2 version of Cyberpunk has been cutdown to keep the frame rate up with the reduction in onscreen background NPCs etc.
I was reading yesterday that the T239 doesn't double Teraflop performance when in FP16 mode so that 2.3 Teraflops approx is still 2.3 Teraflops in FP16 (1:1) and many game engines are optimised for FP16 nowadays to improve performance. The PS4 Pro for example (from memory) is about 4.2 Teraflops in FP32 but 8.4 Teraflops in FP16. I have a Radeon RX 6500 XT card which is about 9 Teraflops FP32 but 18 Teraflops FP16 and that card cost me about 65 pounds. I'm not 100% sure about this because the RTX 2050 does double Teraflops performance with FP16 even the mobile version and is based on similar architecture. So I'm not sure if this is correct. However it seems to be based on the T239 having single Tensor cores for mixed precision not double Tensor cores. You need double Tensor cores for doubling Teraflops at FP16 it seems.
Going back to the ARM cores. This was a news release about the T239;
For the Nintendo Switch 2, NVIDIA is said to utilize a customized variant of NVIDIA Jetson Orin SoC for automotive applications. The reference Orin SoC carries a codename T234, while this alleged adaptation has a T239 codename; the version is most likely optimized for power efficiency. The reference Orin design is a considerable uplift compared to the Tegra X1, as it boasts 12 Cortex-A78AE cores and LPDDR5 memory, along with Ampere GPU microarchitecture. Built on Samsung's 8 nm node, the efficiency would likely yield better battery life and position the second-generation Switch well among the now extended handheld gaming console market. However, including Ampere architecture would also bring technologies like DLSS, which would benefit the low-power SoC.
This seems to back up the single Tensor cores and also use of an inferior ARM A78AE CPU cores re-designed to work on Samsung's 10/8Nm fabrication process. This ARM A78C seems like manipulative marketing. Yes you can make the case they are similar but ARM A78C's are designed for a more modern fabrication process and would Nintendo really have paid up for a complete re-design of the ARM A78C to work on such a dated fabrication process cutting back many features to work on 10/8Nm when there are already ARM A78AE cores? What we do know is Geekwan showed the CPU of Switch 2 had 4MB of cache same as ARM A78AE not the 8MB of the ARM A78C? So just by looking at the PCB we know its likely to be the original ARM A78AE cores. The ARM A78AE cores produce about 7-7.5 DMips of performance per Mhz. So half the passmark given below for single and multi results due to the 1Ghz speed as previously stated. The ARM A78AE seems to be a much more power efficient design compared to A78C but lower performance too. My point is surely on such a system you have to factor in the greater demands of DLSS when upscaling and the very low CPU resources of the console. It isn't that much more powerful than the Jaguar cores of the PS4 in fact its about the same as the faster Jaguar cores of the PS4 Pro overall.
https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A78AE+8+Core+1984+MHz&id=6298
My point is DLSS upscaling requires more CPU resources compared to native rendering at the lower resolution especially if using the highest quality settings and frame generation. The CPU is much, much weaker than many people are stating. The operating system is devoting lots of resources to gamechat and other features.
It's a fantastic portable console just not that powerful overall with huge exaggerated claims about performance.