Pemalite said:
AMD has spent a ton of engineering resources on Vega. It implemented all of Polaris's improvements like Instruction Prefetching and a larger instruction buffer which increased the IPC of each pipe as there is less wave stalls.
But one of Graphics Core Next's largest bottlenecks is... Geometry. Which is ironic considering AMD was pushing Tessellation even back in 2001 when the Playstation 2 was flaunting it's stuff. To that end... AMD introduced the Primitive Discard Accelerator, which abolishes triangles that are to small and pointless to render.. We also saw the introduction of an Index cache, which stores instanced geometry next to the caches.
Graphics Core Next also tends to be ROP limited, which is why AMD reworked them on Polaris which saw a boost to Delta Colour Compression, Larger L2 caches and so on.
And then with Vega AMD kicked it up again by introducing the Draw Stream Binning Rasterization... Which is where Vega gains the ability to bin polygons on a tiled-basis... That in conjunction with the Primitive Discard Accelerator means a significant reduction in the amount of geometry work that needs to be done, boosting geometry throughput substantially.
On the ROP side of the equation... AMD made the ROPS a client of the L2 cache rather than the memory controller, which as L2 caches increases means the ROPS can better leverage it to bolster overall performance... And also enables render-to-texture instead to a frame--buffer, it's a boon for deferred engines.
And then we have the primitive shader too.
In short... Just during the Polaris/Vega introductions a ton of engineering has been done to the geometry side of the equation, it's always been a sore point with AMD's hardware even going back to Terascale.
|
Specifying some of AMD's improvement is irrelevant as long as you don't also specify what nVidia has achieved. A lot of the engineering work goes into improving the performance and power efficiency by switching from third pary cell libraries to custom IC designs for a particular process node. Something nVidia obviously has spent a lot more resources on than AMD and that's something which doesn't show up as a new feature in marketing material.
I spend much of my working time analyzing GPU frame-traces, identifying bottlenecks and how to work around them. Every GPU architecture has bottlenecks, that's nothing new, it's just a matter of what kind of workload you throw at them. I have full access to all the performance counters of the GCN achitecture, both in a numerical and a visual form. For instance, I can see the number of wavefronts executing on each individual SIMD of each CU at any given time during the trace, the issue rate of VALU, SALU, VMEM, EXP, branch instructions, wait cycles due to accessing the K$ cache, exporting pixels or fetching instructions, stalls due to texture rate or texture memory accesses, number of read/write accesses to the color or depth caches, the number of drawn quads, the number of context rolls, the number of processed primitives and percentage of culled primitives, stalls in the rasterizer due to the SPI (Shader Processor Input) or the PA (Primitive Assembly), number of indices processed and reused by the VGT (Vertex Geometry Tessellator), the number of commands parsed/processed by the CGP/CPC, stalls in the CPG/CPC, number of L2 read/writes, L2 hit/miss rate. That's just a few of the available performace counters I've access to. In addition to that I have full documentation to the GCN architecture and I've developed several released games targeting it. Based on that I've a pretty good picture of the strengths/weaknesses of the architecture and I'm interested in hearing if you perhaps have some insight that I lack.
The geometry rate isn't really a bottleneck for GCN. Even if it was, the geometry processing parallelizes quite well and could be solved by increasing the number of VGTs. It won't be a problem in the future either for two reasons. 1) The pixel rate will always be the limiting factor. 2) Primitive/mesh shaders gives the graphics programmer the option to use the CU's compute power to process geometry.
I asked you to specify the inherent flaws and bottlenecks in the GCN architecture that you claim prevents the PS5 from using more than 64CUs, not AMD's marketing material about their GPUs. So again, can you please specify the "multitude of bottlenecks".
Pemalite said:
Yes it is. The entire reason why Terascale 3 ever existed was because load balancing for VLIW5 was starting to get meddlesome as often there were parts of the array being underutilized... The solution? Reduce it down to VLIW4.
It is also why AMD hasn't pushed out past 64 CU's. They potentially can... But that would require a significant overhaul of various parts of Graphics Core Next in order to balance the load and get more efficient utilization.
It's not always about going big and going home... Graphics Core Next tends to already be substantially larger, slower and hotter than the nVidia equivalent anyway.
|
That's irrelevant to your claim about running out of parallelizable work due to screen-space issues when scaling past 64 CUs.
The PS5 has probably been in development for over 5 years already. It's Sony's single most important coming product by far. They have spent vast amount of money and HR on it. AMD has dedicated a big amount of the RTG engineers working on it. Is it reasonble to believe the PS5 essentially will be a PS4 Pro with 64 CUs and 64 ROPs shrunk down to 7nm? If so, it'll be the most expensive and inefficient die shrink EVER. The Pro is designed to run 4K in checkerboard. Obviously a true 4K console needs a rasterizer with at least twice the rate of the Pro's 128 pixels/cycle, so it goes without saying that AMD need to scale up other parts than the number of CUs any way and I don't believe they will make a bare minimum upscale on those parts since the lifecycle of a console is about 5-6 years. IMO, they will most likely scale the number of ROPs above 64 as well, but that's less certain. That said, I think there is merit to your claim that there won't be more than 64 CUs in the PS5. I might even agree it's the most plausible configuration. However, I don't agree to your claims about inherent flaws in the GCN architucture preventing the PS5 of having more than 64CUs. IMO, it's more a question of which pricepoint PS5 will have and how big of an initial financial hit Sony is prepared to take than technical hurdles.
I'm not sure what you mean. It clearly says the area reduction is 70% and a 60% reduction in power consumtion. Pretty inline with what I wrote. An area reduction of 70% would yield a density increase of 3.3x. Probably just a rounding issue.
Here are the links to TSMC's own numbers.
https://www.tsmc.com/english/dedicatedFoundry/technology/10nm.htm
https://www.tsmc.com/english/dedicatedFoundry/technology/7nm.htm