By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Straffaren666 said:
Bofferbrauer2 said:

@bolded: I said that before already. However, the way they are organised only allows for 64CU. And that's compute engines, not shader engines, which is something different entirely (basically a rebrand of the GCA, the Graphics and Core Array, introduced with GCN2). GCN5 only has 4 of them because they can only 4 of them reliably with instructions. Technically they could go past the 64 with more compute englines, but it wouldn't actually increase the performance as the CU would be idling half the time because they don't get any instructions. Hence why it's agreed that 64CU is the limit.

That is not how GCN works. Each CU has its own instruction scheduler. The command processor can process commands at a far higher rate than the front-end can process them and the same applies for the front-end to the back-end (SEs/CUs). It's very trivial to saturate the CUs with wavefronts and that happens all the time in the current design. That said, the processing power of some parts of the front-end probably should be increased if the CUs are increased to keep the architecture balanced.

I'm not aware of any technical limitations to go above 64 CUs but they probably exist. I suspect the biggest hurdle is to increase the bandwidth of the L2 cache, which should scale with the number of CUs to keep the system balanced.

Yes and No.

Each CU has his own scheduler (called the CU scheduler), that's true. But GCN has also another scheduler for the draw and command queues, added with GCN3 because the CU scheduler was so inefficient. You may know this under Async compute, as that's how AMD branded that second scheduler. However, it can't handle too many tasks at once, thus getting very inefficient with higher CU counts.

Add to this the fact that the GPU driver must handle it's own scheduler (Number 3), which is handled by the CPU, and you can probably see the problems AMD has with their GCN schedulers.

Long story short: CU scheduler dispatches the commands (wavefronts) between the shaders, Async compute scheduler is supposed fixing any holes in those schedule pipelines, and the driver scheduler sets the order of the commands in each shader. Confusingly complicated, isn't it?

Last edited by Bofferbrauer2 - on 14 March 2019