By using this site, you agree to our Privacy Policy and our Terms of Use. Close
haxxiy said:
Soundwave said:

The problem that I think arises is there's really no need for 1536 CUDA cores to get that performance. You could get the same performance from 1024 CUDA cores and just clock them higher (which has no effect on the cost) and the chip would be cheaper and have better yields. Having a massive chip like that for no reason just doesn't make sense, your yields will be worse making production more expensive and you're paying for a more complex chip for no reason. 

There is, if you're taking power consumption into account. It scales linearly with frequency but quadratically with voltage, which needs to be higher at higher clocks. A smaller chip at higher clocks would consume more power even if it performs the same.

I mean this analysis was done already.

12SM (1536 cores) (assuming T239) just doesn't make sense on Samsung 8nm from a power-per-performance perspective. This is because the voltage-frequency curve flattens at very low frequencies (in this case <470Mhz.)

https://famiboards.com/threads/future-nintendo-hardware-technology-speculation-discussion-st-read-the-staff-posts-before-commenting.55/page-1142#post-683773

There are two things to take from the above. First, as a general point, every chip on a given manufacturing process has a peak efficiency clock, below which you lose power efficiency by reducing clocks. Secondly, we have the data from Orin to know pretty well where this point is for a GPU very similar to T239's on a Samsung 8nm process, which is around 470MHz.

---

That is, if the power budget is 3W for the GPU, and the peak efficiency clock is 470MHz, and the power consumption per SM at 470MHz is 0.5W, then the best possible GPU they could include would be a 6 SM GPU running at 470MHz. Using a smaller GPU would mean higher clocks, and efficiency would drop, but using a larger GPU with lower clocks would also mean efficiency would drop, because we're already at the peak efficiency clock.

In reality, it's rare to see a chip designed to run at exactly that peak efficiency clock, because there's always a financial budget as well as the power budget. Running a smaller GPU at higher clocks means you save money, so the design is going to be a tradeoff between a desire to get as close as possible to the peak efficiency clock, which maximises performance within a fixed power budget, and as small a GPU as possible, which minimises cost. Taking the same example, another option would be to use 4 SMs and clock them at around 640MHz. This would also consume 3W, but would provide around 10% less performance. It would, however, result in a cheaper chip, and many people would view 10% performance as a worthwhile trade-off when reducing the number of SMs by 33%.

Basically 512 cores (4SM) or 768 cores (6SM) would give you better performance for the same power target, and less cost than 1536 cores (12SM), if the GPU is a T239 and if it is on Samsung 8nm. 

Last edited by sc94597 - on 21 September 2023