AMD gettig $100 for ach chip they give to sony isn't them selling them at bargain prices at all. Thats them selling them at "bulk/OEM" pricing which is totally normal when any company puts in orders in the region of millions.
Take the 3600G or instance, say AMD sells that at retail for $220, that pansout like this... the actual cost of making each of those chips (what AMD pays to the foundry) is like $30/$40. Then AMD will add their markup to account for things like yields, profits, packaging and shipping..etc. At this point the chip comes up to around $170. Then they put their MSRP sticker price of $220 so te retailers make their own ut too.
Pretty sure AMD's profit margins are on average 61% for PC chips.
So a $220 CPU is likely costing AMD $85.8 in manufacturing and other logistics.
Consoles are actually a significant revenue driver for AMD though, which is good... Not nearly as lucrative as PC chip sales, but it helped keep AMD afloat when it needed it most.
@bolded: We don't even know if that's a true chip (and at 20CU, I really doubt it, especially considering it will be totally bandwith starved even with DDR4 4000). But I digress.
DDR4 4000 can offer more bandwidth than HBM2. It is entirely how wide you wish to take things... But before then you reach a point where it's more economical to choose another technology anyway.
However... Considering that current Ryzen APU's with 38GB/s~ of bandwidth are certainly bandwidth starved with 11 CU's... I doubt that is going to change with 20CU APU's that have 68GB/s~ bandwidth.
But if you were to run that DDR4 4000 DRAM on a 512-bit bus, suddenly we are talking 256GB/s of bandwidth, which is more than sufficient for even a 40 CU count.
Specifying some of AMD's improvement is irrelevant as long as you don't also specify what nVidia has achieved. A lot of the engineering work goes into improving the performance and power efficiency by switching from third pary cell libraries to custom IC designs for a particular process node. Something nVidia obviously has spent a lot more resources on than AMD and that's something which doesn't show up as a new feature in marketing material.
I am aware. Not my first Rodeo.
I spend much of my working time analyzing GPU frame-traces, identifying bottlenecks and how to work around them. Every GPU architecture has bottlenecks, that's nothing new, it's just a matter of what kind of workload you throw at them. I have full access to all the performance counters of the GCN achitecture, both in a numerical and a visual form. For instance, I can see the number of wavefronts executing on each individual SIMD of each CU at any given time during the trace, the issue rate of VALU, SALU, VMEM, EXP, branch instructions, wait cycles due to accessing the K$ cache, exporting pixels or fetching instructions, stalls due to texture rate or texture memory accesses, number of read/write accesses to the color or depth caches, the number of drawn quads, the number of context rolls, the number of processed primitives and percentage of culled primitives, stalls in the rasterizer due to the SPI (Shader Processor Input) or the PA (Primitive Assembly), number of indices processed and reused by the VGT (Vertex Geometry Tessellator), the number of commands parsed/processed by the CGP/CPC, stalls in the CPG/CPC, number of L2 read/writes, L2 hit/miss rate. That's just a few of the available performace counters I've access to. In addition to that I have full documentation to the GCN architecture and I've developed several released games targeting it. Based on that I've a pretty good picture of the strengths/weaknesses of the architecture and I'm interested in hearing if you perhaps have some insight that I lack.
I am unable to verify any of that, nor does it take precedence over my own knowledge or qualifications. In short, it's irrelevant.
The geometry rate isn't really a bottleneck for GCN. Even if it was, the geometry processing parallelizes quite well and could be solved by increasing the number of VGTs. It won't be a problem in the future either for two reasons. 1) The pixel rate will always be the limiting factor. 2) Primitive/mesh shaders gives the graphics programmer the option to use the CU's compute power to process geometry.
It's always been a bottleneck in AMD's hardware even going back to Terascale.
I asked you to specify the inherent flaws and bottlenecks in the GCN architecture that you claim prevents the PS5 from using more than 64CUs, not AMD's marketing material about their GPUs. So again, can you please specify the "multitude of bottlenecks".
Bottlenecks (Like Geometry) have always been an Achilles heel of AMD GPU architectures even back in the Terascale days.
nVidia was always on the ball once they introduced their Polymorph engines.
But feel free to enlighten me on why AMD's GPU's fall short despite their overwhelming advantage in single precision floating point operations relative to their nVidia counterpart.
I'm not sure what you mean. It clearly says the area reduction is 70% and a 60% reduction in power consumtion. Pretty inline with what I wrote. An area reduction of 70% would yield a density increase of 3.3x. Probably just a rounding issue.
Here are the links to TSMC's own numbers.
I was neither disagreeing or agreeing with your claims, just wanted evidence for my own curiosity to take your claim seriously.
Density at any given node is always changing, Intel is on what... It's 3rd or 4th iteration of 14nm? And each time density has changed. Hence why it's important to do apple to apples comparisons.
As for TSMC's 10nm and 7nm comparisons... I would not be surprised if TSMC's 10nm process actually leveraged a 14nm BEOL... TSMC, Samsung, Global Foundries etc' don't tend to do full node (FEOL+BEOL) shrinks at the same time like Intel does.
The 7nm process likely leverages 10nm design rules...
But even then TSMC's information on their 7nm process is likely optimized for sram at the moment, where-as their 10nm process is not in the links you provided, which ultimately skews things in 7nm's favour as you are less likely to need less patterning... And you can optimize for the sram cells relatively simple structures compared to more complex logic.