haxxiy said:
That's true for GPU cores, but upscaling and frame generation don't need to read anything from the GPU except for the frame buffer. In theory, you can fit a ray-tracing solution then although you'd be very limited in terms of shading (like other post-procs done in the ROP) but maybe something clever can be thought of such as path tracing frames on and off and matching in the back buffer. I keep hoping we'll see that smart stuff like that from AMD or Intel but nope. They just want to fund stuff that can allow them to compete in HPC. As for 3D SiCs, that feels like an even greater challenge than chiplets ('2.5D') and most of the industry seems to agree. |
Up-scaling and frame generation do need a lot of fast caches to keep data buffered to work from, otherwise it does get expensive.
nVidia could in theory make all the Ray Tracing and Tensor/A.I routines run on the shader engines/Cuda cores as they are capable of 4/8/16INT and Floats to various degrees, but dedicating the hardware and slimming down those units is more space-efficient rather than making each shader pipeline more flexible... Although nVidia did make it's CUDA cores capable of performing Integers and Floating Point numbers concurrently with Turing if I remember correctly.
Many simpler upscaling/Anti-Aliasing methods these days are actually performed on the Shaders/Cuda cores because it's cheap and fast rather than ROPS... Often ROPS will be a bottleneck in a GPU design, hence why the Geforce 760 had such a massive improvement over the 660... It had less CUDA Cores and Teraflops, but much higher ROP throughput. (2.25 Teraflop on the 760 vs 2.45 Teraflops on the 660.)
In some scenarios where AA was done on the Cuda cores, the 660 would be faster than the 760, but once you start leveraging MSAA on the ROPS, the 760 would dwarf the 660. (Re-affirms the idea that Teraflops is also bullshit for determining gaming performance as well.)
So you could in theory run Ray Tracing operations on the Tensor cores rather than the RT cores if you wanted... But we can't couple those operations to ROPS as they just don't have the INT/FP throughput.
3D chip stacking started out with stacked DRAM, then NAND. Now we are doing it with Stacked cache on top of CPU cores.
The next jump will be stacked chiplets. - Yeah it will be a challenge, but so was chiplets once upon a time.
Stacked chips though can actually get around the limited latency and bandwidth issues of Fabric interconnects, so there are inherent benefits... Imagine current Ryzen chips, using the same fabrication, but consuming 50% LESS power than they do currently by stacking the chiplets. (That's how much energy is wasted on the fabric.)
--::{PC Gaming Master Race}::--