Pemalite said:
It's not as simple as that.
When you halve the precision, you tend to double the amount of computations you can perform (Depending on hardware support. I.E. Rapid Packed Math)... And that means the reverse is true. - It's rare on GPU's where you halve the precision you retain the same level of output. So in general... FP16 will be twice as slow as FP8.
So it is actually harder to emulate.
For the most part AMD and nVidia have supported low-precision INT/FP for 5+ years now and more recently started to adopt bfloat.
As for DLSS itself, it does consume rendering budget, it's not a free lunch. |
Functionally that is an acceptable level of compromise given the enormous gap between the Switch 2 and modern PC hardware - a good GPU's shaders already idle 80-90% of the time with ShadPS4, for instance. The main issue will likely be how to handle its interactions/requests to other system modules.
I'd expect the docked CPU will be the slowest component and need a lot of HLE hacks to work at first... though I'm not sure how heavy or not ARM emulation is in C++, to be honest, especially with SSE/AVX support. Apparently ARMv7, at least, runs well.