Bofferbrauer2 said: @bolded: I said that before already. However, the way they are organised only allows for 64CU. And that's compute engines, not shader engines, which is something different entirely (basically a rebrand of the GCA, the Graphics and Core Array, introduced with GCN2). GCN5 only has 4 of them because they can only 4 of them reliably with instructions. Technically they could go past the 64 with more compute englines, but it wouldn't actually increase the performance as the CU would be idling half the time because they don't get any instructions. Hence why it's agreed that 64CU is the limit. |
That is not how GCN works. Each CU has its own instruction scheduler. The command processor can process commands at a far higher rate than the front-end can process them and the same applies for the front-end to the back-end (SEs/CUs). It's very trivial to saturate the CUs with wavefronts and that happens all the time in the current design. That said, the processing power of some parts of the front-end probably should be increased if the CUs are increased to keep the architecture balanced.
I'm not aware of any technical limitations to go above 64 CUs but they probably exist. I suspect the biggest hurdle is to increase the bandwidth of the L2 cache, which should scale with the number of CUs to keep the system balanced.