Basically there are a few principles to consider when doing this analysis:
The larger the chip => the more expensive it is, all else equal. This is because you cut fewer chips per wafer AND because there is a greater likelihood of defects, resulting in fewer useable chips.
The "smaller" the process node => the more costly the wafer, but not necessarily the chips produced by it.
This is because:
The "smaller" the process node => the denser the transistor complexity of the wafer => more chips that can be cut from that wafer.
Voltage (and therefore power) is loosely proportional to clock speed until you bring it so low that you approach a sort of "minimum voltage" (before which you need to shut down cores.)
GPU's can be utilized well in parallel workloads, so having more cores can in most cases easily make up for a low voltage, but having more cores increases die size and therefore cost (for reasons mentioned earlier.)
There is an optimal voltage/core count for a given power profile, on a give process node.
Because core clock can vary, but core count is set, it is important to get core count correct earlier.
12SM's is nowhere near the optimal core count for an 8N Samsung chip at 3W. It might be doable on a 4N TSMC chip.
Last edited by sc94597 - on 15 September 2023