By using this site, you agree to our Privacy Policy and our Terms of Use. Close
haxxiy said:
Pemalite said:

GPU chiplets are super super hard.

You can't turn the GPU cores into chiplets because you need low latency and high bandwidth... Both things that chiplets heavily impact.

CPU cores can get away with it... Because CPU cores aren't transferring upwards of several terabytes of data per second.

AMD's approach was to actually break up their memory interface and have lots of memory interfaces instead.

So instead of having a single 384bit interfacing directly with memory, AMD has made 12x 32bit interfaces.
Each chiplet is thus housing two 2x32bit interfaces, which interfaces directly with DRAM to make a cumulative 384bit interface.

It also means instead of 1x fabric going from one large memory controller to the CCD like on a Ryzen CPU, there are 6x fabrics which can do 900GB/s of bi-directional traffic... Each.
Suddenly they have more than enough bandwidth to interface the GPU cores to the memory interface... But still not enough for multiple GPU core chiplets.

It also does mean that there is very little room to interface additional chips to AMD GPU's as the memory controller chiplet approach takes up most of the surrounding area around the cores themselves.

But I could see them integrating them into the memory controller chiplets at a later date... Sadly I don't think we will ever see the holy grail of multi-GPU chiplets due to the lack of bandwidth with the infinity link.

But I do see a future where we have stacked GPU chiplet dies just like how we stack cache on CPU's now.

That's true for GPU cores, but upscaling and frame generation don't need to read anything from the GPU except for the frame buffer.

In theory, you can fit a ray-tracing solution then although you'd be very limited in terms of shading (like other post-procs done in the ROP) but maybe something clever can be thought of such as path tracing frames on and off and matching in the back buffer.

I keep hoping we'll see that smart stuff like that from AMD or Intel but nope. They just want to fund stuff that can allow them to compete in HPC.

As for 3D SiCs, that feels like an even greater challenge than chiplets ('2.5D') and most of the industry seems to agree.

Up-scaling and frame generation do need a lot of fast caches to keep data buffered to work from, otherwise it does get expensive.

nVidia could in theory make all the Ray Tracing and Tensor/A.I routines run on the shader engines/Cuda cores as they are capable of 4/8/16INT and Floats to various degrees, but dedicating the hardware and slimming down those units is more space-efficient rather than making each shader pipeline more flexible... Although nVidia did make it's CUDA cores capable of performing Integers and Floating Point numbers concurrently with Turing if I remember correctly.

Many simpler upscaling/Anti-Aliasing methods these days are actually performed on the Shaders/Cuda cores because it's cheap and fast rather than ROPS... Often ROPS will be a bottleneck in a GPU design, hence why the Geforce 760 had such a massive improvement over the 660... It had less CUDA Cores and Teraflops, but much higher ROP throughput. (2.25 Teraflop on the 760 vs 2.45 Teraflops on the 660.)
In some scenarios where AA was done on the Cuda cores, the 660 would be faster than the 760, but once you start leveraging MSAA on the ROPS, the 760 would dwarf the 660. (Re-affirms the idea that Teraflops is also bullshit for determining gaming performance as well.)

So you could in theory run Ray Tracing operations on the Tensor cores rather than the RT cores if you wanted... But we can't couple those operations to ROPS as they just don't have the INT/FP throughput.

3D chip stacking started out with stacked DRAM, then NAND. Now we are doing it with Stacked cache on top of CPU cores.
The next jump will be stacked chiplets. - Yeah it will be a challenge, but so was chiplets once upon a time.

Stacked chips though can actually get around the limited latency and bandwidth issues of Fabric interconnects, so there are inherent benefits... Imagine current Ryzen chips, using the same fabrication, but consuming 50% LESS power than they do currently by stacking the chiplets. (That's how much energy is wasted on the fabric.)



--::{PC Gaming Master Race}::--