By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo Discussion - VGC: Switch 2 Was Shown At Gamescom Running Matrix Awakens UE5 Demo

Oneeee-Chan!!! said:

zeldaring said:

It's 8nm The leaker go his info from a credible source.

You mean Kopite?

He's one of the most credible leakers for NVDA and well known and he's saying 8nm with out hesitation. He clearly knows what he's talking since he leaks many nividia GPU's.

Last edited by zeldaring - on 15 September 2023

Around the Network

This post explains why 8nm doesn't make sense from a "Nintendo goes cheap and low power" perspective. 

Either our assumptions about power consumption are wrong or the process node (8nm) is wrong. Which one seems more likely?

https://famiboards.com/threads/future-nintendo-hardware-technology-speculation-discussion-st-read-the-staff-posts-before-commenting.55/post-683773

"However, while it's reasonable to design a chip with intent to clock it at the peak efficiency clock, or to clock it above the peak efficiency clock, what you're not going to see is a chip that's intentionally designed to run at a fixed clock speed that's below the peak efficiency clock. The reason for this is pretty straight-forward; if you have a design with a large number of SMs that's intended to run at a clock below the peak efficiency clock, you could just remove some SMs and increase the clock speed and you would get both better performance within your power budget and it would cost less." 

"The above section wasn't theoretical. Nvidia and Nintendo did sit in a room (or have a series of calls) to design a chip for a new Nintendo console, and what they came out with is T239. We know that the result of those discussions was to use a 12 SM Ampere GPU. We also know the power curve, and peak efficiency clock for a very similar Ampere GPU on 8nm.


The GPU in the TX1 used in the original Switch units consumed around 3W in portable mode, as far as I can tell. In later models with the die-shrunk Mariko chip, it would have been lower still. Therefore, I would expect 3W to be a reasonable upper limit to the power budget Nintendo would allocate for the GPU in portable mode when designing the T239.

With a 3W power budget and a peak efficiency clock of 470MHz, then the (again, not theoretical) numbers above tell us the best possible performance would be achieved by a 6 SM GPU operating at 470MHz, and that you'd be able to get 90% of that performance with a 4 SM GPU operating at 640MHz. Note that neither of these say 12 SMs. A 12 SM GPU on Samsung 8nm would be an awful design for a 3W power budget. It would be twice the size and cost of a 6 SM GPU while offering much less performance, if it's even possible to run within 3W at any clock.

There's no world where Nintendo and Nvidia went into that room with an 8nm SoC in mind and a 3W power budget for the GPU in handheld mode, and came out with a 12 SM GPU. That means either the manufacturing process, or the power consumption must be wrong (or both). I'm basing my power consumption estimates on the assumption that this is a device around the same size as the Switch and with battery life that falls somewhere between TX1 and Mariko units. This seems to be the same assumption almost everyone here is making, and while it could be wrong, I think them sticking with the Switch form-factor and battery life is a pretty safe bet, which leaves the manufacturing process."

"So, what manufacturing process can give a 2.5x improvement in efficiency over Samsung 8nm? The only reasonable answer I can think of is TSMC's 5nm/4nm processes, including 4N, which just happens to be the process Nvidia is using for every other product (outside of acquired Mellanox products) from this point onwards. In Nvidia's Ada white paper (an architecture very similar to Ampere), they claim a 2x improvement in performance per Watt, which appears to come almost exclusively from the move to TSMC's 4N process, plus some memory changes."

With 4N just about stretching to the 2.5x improvement in efficiency required for a 12 SM GPU to make sense, I don't think the chances for any other process are good. We don't have direct examples for other processes like we have for Ada, but from everything we know, TSMC's 5nm class processes are significantly more efficient than either their 6nm or Samsung's 5nm/4nm processes. If it's a squeeze for 12 SMs to work on 4N, then I can't see how it would make sense on anything less efficient than 4N.

Last edited by sc94597 - on 15 September 2023

sc94597 said:

This post explains why 8nm doesn't make sense from a "Nintendo goes cheap and low power" perspective. 

Either our assumptions about power consumption are wrong or the process node (8nm) is wrong. Which one seems more likely?

https://famiboards.com/threads/future-nintendo-hardware-technology-speculation-discussion-st-read-the-staff-posts-before-commenting.55/post-683773

"However, while it's reasonable to design a chip with intent to clock it at the peak efficiency clock, or to clock it above the peak efficiency clock, what you're not going to see is a chip that's intentionally designed to run at a fixed clock speed that's below the peak efficiency clock. The reason for this is pretty straight-forward; if you have a design with a large number of SMs that's intended to run at a clock below the peak efficiency clock, you could just remove some SMs and increase the clock speed and you would get both better performance within your power budget and it would cost less." 

"The above section wasn't theoretical. Nvidia and Nintendo did sit in a room (or have a series of calls) to design a chip for a new Nintendo console, and what they came out with is T239. We know that the result of those discussions was to use a 12 SM Ampere GPU. We also know the power curve, and peak efficiency clock for a very similar Ampere GPU on 8nm.


The GPU in the TX1 used in the original Switch units consumed around 3W in portable mode, as far as I can tell. In later models with the die-shrunk Mariko chip, it would have been lower still. Therefore, I would expect 3W to be a reasonable upper limit to the power budget Nintendo would allocate for the GPU in portable mode when designing the T239.

With a 3W power budget and a peak efficiency clock of 470MHz, then the (again, not theoretical) numbers above tell us the best possible performance would be achieved by a 6 SM GPU operating at 470MHz, and that you'd be able to get 90% of that performance with a 4 SM GPU operating at 640MHz. Note that neither of these say 12 SMs. A 12 SM GPU on Samsung 8nm would be an awful design for a 3W power budget. It would be twice the size and cost of a 6 SM GPU while offering much less performance, if it's even possible to run within 3W at any clock.

There's no world where Nintendo and Nvidia went into that room with an 8nm SoC in mind and a 3W power budget for the GPU in handheld mode, and came out with a 12 SM GPU. That means either the manufacturing process, or the power consumption must be wrong (or both). I'm basing my power consumption estimates on the assumption that this is a device around the same size as the Switch and with battery life that falls somewhere between TX1 and Mariko units. This seems to be the same assumption almost everyone here is making, and while it could be wrong, I think them sticking with the Switch form-factor and battery life is a pretty safe bet, which leaves the manufacturing process."

"So, what manufacturing process can give a 2.5x improvement in efficiency over Samsung 8nm? The only reasonable answer I can think of is TSMC's 5nm/4nm processes, including 4N, which just happens to be the process Nvidia is using for every other product (outside of acquired Mellanox products) from this point onwards. In Nvidia's Ada white paper (an architecture very similar to Ampere), they claim a 2x improvement in performance per Watt, which appears to come almost exclusively from the move to TSMC's 4N process, plus some memory changes."

We are speculating on rumors is the issue.  



Chrkeller said:

."

We are speculating on rumors is the issue.  

There is some hard evidence though. The Nvidia leak (and the 12SM GPU - T239 it proved existed) aren't a rumor. 

Nintendo and Nvidia could have changed their mind since then, but at one point they intended to go with a 12SM GPU. 

Basically if Switch 2 = T239, then either process node =/= 8nm or our power consumption assumptions are wrong. 



sc94597 said:
Chrkeller said:

We are speculating on rumors is the issue.  

There is some hard evidence though. The Nvidia leak (and the 12SM GPU - T239 it proved existed) aren't a rumor. 

Nintendo and Nvidia could have changed their mind since then, but at one point they intended to go with a 12SM GPU. 

Basically if Switch 2 = T239, then either process node =/= 8nm or our power consumption assumptions are wrong. 

Leaks are assumptions.  It isn't hard evidence.

Wind Waker on the switch was leaked from a 'reliable' source.....   

What Nintendo will or will not use is speculation.  Especially since underclocking is a real possibility like with the current model.



Around the Network
Chrkeller said:
sc94597 said:

There is some hard evidence though. The Nvidia leak (and the 12SM GPU - T239 it proved existed) aren't a rumor. 

Nintendo and Nvidia could have changed their mind since then, but at one point they intended to go with a 12SM GPU. 

Basically if Switch 2 = T239, then either process node =/= 8nm or our power consumption assumptions are wrong. 

Leaks are assumptions.  It isn't hard evidence.

Wind Waker on the switch was leaked from a 'reliable' source.....   

What Nintendo will or will not use is speculation.  Especially since underclocking is a real possibility like with the current model.

Sure, if we throw out the assumption that the Switch 2 is using T239, then anything is possible. I was addressing the point that Switch 2 could be T239 AND 8nm. 

The post I shared explains why underclocking might not even be an option. There is a point where you can't reduce power consumption anymore by reducing clocks. 

Because power consumption is mostly related to voltage, not clock speed, when you reduce clocks but keep the voltage the same, you don't really save much power. A large part of the power consumption called "static power" stays exactly the same, while the other part, "dynamic power", does fall off a bit. What you end up with is much less performance, but only slightly less power consumption. That is, power efficiency gets worse.

So that kink in the efficiency graph, between 420MHz and 522MHz, is the point at which you can't reduce the voltage any more. Any clocks below that point will all operate at the same voltage, and without being able to reduce the voltage, power efficiency gets worse instead of better below that point. The clock speed at that point can be called the "peak efficiency clock", as it offers higher power efficiency than any other clock speed.

Last edited by sc94597 - on 15 September 2023

sc94597 said:
Chrkeller said:

Leaks are assumptions.  It isn't hard evidence.

Wind Waker on the switch was leaked from a 'reliable' source.....   

What Nintendo will or will not use is speculation.  Especially since underclocking is a real possibility like with the current model.

Sure, if we throw out the assumption that the Switch 2 is using T239, then anything is possible. I was addressing the point that Switch 2 could be T239 AND 8nm. 

The post I shared explains why underclocking might not even be an option. There is a point where you can't reduce power consumption anymore by reducing clocks. 

Because power consumption is mostly related to voltage, not clock speed, when you reduce clocks but keep the voltage the same, you don't really save much power. A large part of the power consumption called "static power" stays exactly the same, while the other part, "dynamic power", does fall off a bit. What you end up with is much less performance, but only slightly less power consumption. That is, power efficiency gets worse.

So that kink in the efficiency graph, between 420MHz and 522MHz, is the point at which you can't reduce the voltage any more. Any clocks below that point will all operate at the same voltage, and without being able to reduce the voltage, power efficiency gets worse instead of better below that point. The clock speed at that point can be called the "peak efficiency clock", as it offers higher power efficiency than any other clock speed.

I'll admit most of that is over my head.  I'll take your word for it.



Basically there are a few principles to consider when doing this analysis:

The larger the chip => the more expensive it is, all else equal. This is because you cut fewer chips per wafer AND because there is a greater likelihood of defects, resulting in fewer useable chips. 

The "smaller" the process node => the more costly the wafer, but not necessarily the chips produced by it. 

This is because:

The "smaller" the process node => the denser the transistor complexity of the wafer => more chips that can be cut from that wafer.

Voltage (and therefore power) is loosely proportional to clock speed until you bring it so low that you approach a sort of "minimum voltage" (before which you need to shut down cores.)

GPU's can be utilized well in parallel workloads, so having more cores can in most cases easily make up for a low voltage, but having more cores increases die size and therefore cost (for reasons mentioned earlier.) 

There is an optimal voltage/core count for a given power profile, on a give process node. 

Because core clock can vary, but core count is set, it is important to get core count correct earlier.

12SM's is nowhere near the optimal core count for an 8N Samsung chip at 3W. It might be doable on a 4N TSMC chip.

Last edited by sc94597 - on 15 September 2023

I replied to the wrong person.

Last edited by Oneeee-Chan!!! - on 15 September 2023

sc94597 said:

Basically there are a few principles to consider when doing this analysis:

The larger the chip => the more expensive it is, all else equal. This is because you cut fewer chips per wafer AND because there is a greater likelihood of defects, resulting in fewer useable chips. 

The "smaller" the process node => the more costly the wafer, but not necessarily the chips produced by it. 

This is because:

The "smaller" the process node => the denser the transistor complexity of the wafer => more chips that can be cut from that wafer.

Voltage (and therefore power) is loosely proportional to clock speed until you bring it so low that you approach a sort of "minimum voltage" (before which you need to shut down cores.)

GPU's can be utilized well in parallel workloads, so having more cores can in most cases easily make up for a low voltage, but having more cores increases die size and therefore cost (for reasons mentioned earlier.) 

There is an optimal voltage/core count for a given power profile, on a give process node. 

Because voltage can vary, but core count is set, it is important to get core count correct earlier.

12SM's is nowhere near the optimal core count for an 8N Samsung chip at 3W. It might be doable on a 4N TSMC chip.

The guy posting the info is not some idiot he understands all this. He even caught someone postings a fake NVDA  card video and called them out before it was released his info says 8NM and people have replied with your same theory yet he didn't change his info, meaning he probably has a very reliable source but course its all a rumor and we will have to wait  i would wager on him being right though.

Last edited by zeldaring - on 15 September 2023