By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Nintendo Discussion - Clarifying the 1.5TFLOPS of the SWITCH for those who just see the numbers.

vivster said:

Now the Switch's mystical number of 1.5TFLOPS refers to FP16. The FP32 performance is logically at 750GFLOPS. For comparison the PS4 sports 1.84TFLOPS in FP32. Its FP16 performance is naturally double as fast.

the PS4 still used GCN 2.0, which afaik didn't have native FP16/INT16 support meaning half-presicion tasks won't run at twice at a time on that architecture

GCN 4.0 (Polaris) and Pascall on the other hands added native support for half-precision tasks, so that two can be done at a time



Around the Network
Lafiel said:
vivster said:

Now the Switch's mystical number of 1.5TFLOPS refers to FP16. The FP32 performance is logically at 750GFLOPS. For comparison the PS4 sports 1.84TFLOPS in FP32. Its FP16 performance is naturally double as fast.

the PS4 still used GCN 2.0, which afaik didn't have native FP16/INT16 support meaning half-presicion tasks won't run at twice at a time on that architecture

GCN 4.0 (Polaris) and Pascall on the other hands added native support for half-precision tasks, so that two can be done at a time

I thought he was joking in that case but who knows...



vivster said:
Volterra_90 said:
I don't really know anything about hardware specs tbh. I know about good games, and that's what I really want... Switch will succeed with a good catalogue. Imo it has to be also cheap to suceed, so I really don't mind if the console is underpowered in the end.

I think so too. 750GFLOPS for the Switch is plenty for the kinds of games it will run. It doesn't seem to strive for parity with the bigger twins.

I just can't stand when people claim it's close to their performance.

As I said, I don't know too much about hardware specs, but I believe that a portable, close to PS4/One power, would be extremely expensive and really power consuming. Correct me if I'm wrong xD. And it would be pretty much absurd competing with PS4/One, it's a lost battle imo. They're better doing their own thing. And that is a cheap, Nintendo-based console. 



Lafiel said:
vivster said:

Now the Switch's mystical number of 1.5TFLOPS refers to FP16. The FP32 performance is logically at 750GFLOPS. For comparison the PS4 sports 1.84TFLOPS in FP32. Its FP16 performance is naturally double as fast.

the PS4 still used GCN 2.0, which afaik didn't have native FP16/INT16 support meaning half-presicion tasks won't run at twice at a time on that architecture

GCN 4.0 (Polaris) and Pascall on the other hands added native support for half-precision tasks, so that two can be done at a time

Didn't actually know that. Stupid me for overestimating AMD again. Removed it from the OP.



If you demand respect or gratitude for your volunteer work, you're doing volunteering wrong.

The thing that is really starting to urk me is that people are trying to spin it in a way to show that their console is more powerful than it really is. But it is soooo stupid cause FP16 can apply to all current gen hardware. So switch is 1.5TF, Pro is 8.4TF and Scorpio is 12 TF and etc when you get PC into mind. So it ends up being the same in relative performance, just with bigger numbers.

Sighh

Granted its not exactly like that but come on



                  

PC Specs: CPU: 7800X3D || GPU: Strix 4090 || RAM: 32GB DDR5 6000 || Main SSD: WD 2TB SN850

Around the Network
vivster said:
BlkPaladin said:

Just a correction you are not entirely correct about this. Percision is not how accuratly the chip does the calculation but how big the instruction is, that is when you are dealing with programming. FP16, floating point 16, are floating point instructions that are only 16-bits big maximum. And FP32 are 32 bit max instructions. For some procedures you only need FP16 and if you optimize correctly you can run two instruction concurently depending on the chip, from how nVidia is advertising this chip it seems to be the case. So in some cases and if you don't need 32-bit instructions you can process instructions faster this way.

This is one of the reasons why you don't want to look at just FLOPS, or anything else as is. I seen a person saying that the Xbox One runs at twice the speed if you do it in FP16 which may not be the case since the chip might not be able to do two FP16 instructions at the same time, though it may allow them.

Well, the length of a floating point number IS its precision. Like 3.14159265359 is more precise than 3.14. It's used like that in physics where precision is important and as such you will use the most precise number possible. You can use smaller numbers but the end product while correct will not be as precise.

Precision is just a fancy word for longer numbers.

I covered that in my answer, which I was editing at the time to add more infomation. But the way you orginally worded it make is sound like the calcuations may not come out correctly, and you don't alway want to be percise because in a lot of calcuations needless percision can throw off you results.

In programming which is what chips deal with you may not need to run instruction in FP32, and do it in FP16 instead which speeds up calcuations especially when the chips allow two FP16 instructions to run concurrently, if I remember correctly it is how many registers there are to run an instruction. Some chips use two 16-bit registers to run a 32-bit instruction and can change to doing two 16-bit instructions on the fly when there is optimization for it on the machine level. This allows for some secections of code to run faster. And on the other hand some registers are 32-bit registers only so even if you are putting 16-bit instructions through them they can only do one instruction at a time.



I knew exactly none of this. Nice post. Good explanation. Thread title seems familiar......



- "If you have the heart of a true winner, you can always get more pissed off than some other asshole."

vivster said:
Lafiel said:

the PS4 still used GCN 2.0, which afaik didn't have native FP16/INT16 support meaning half-presicion tasks won't run at twice at a time on that architecture

GCN 4.0 (Polaris) and Pascall on the other hands added native support for half-precision tasks, so that two can be done at a time

Didn't actually know that. Stupid me for overestimating AMD again. Removed it from the OP.

the websites I found mentioning that aspect of Polaris seemed to say Nvidia only just added that (to their consumer cards) with Pascal aswell - I imagine professional cards already offered native support for a longer time



BlkPaladin said:
vivster said:

Well, the length of a floating point number IS its precision. Like 3.14159265359 is more precise than 3.14. It's used like that in physics where precision is important and as such you will use the most precise number possible. You can use smaller numbers but the end product while correct will not be as precise.

Precision is just a fancy word for longer numbers.

I covered that in my answer, which I was editing at the time to add more infomation. But the way you orginally worded it make is sound like the calcuations may not come out correctly, and you don't alway want to be percise because in a lot of calcuations needless percision can throw off you results.

In programming which is what chips deal with you may not need to run instruction in FP32, and do it in FP16 instead which speeds up calcuations especially when the chips allow two FP16 instructions to run concurrently, if I remember correctly it is how many registers there are to run an instruction. Some chips use two 16-bit registers to run a 32-bit instruction and can change to doing two 16-bit instructions on the fly when there is optimization for it on the machine level. This allows for some secections of code to run faster. And on the other hand some registers are 32-bit registers only so even if you are putting 16-bit instructions through them they can only do one instruction at a time.

Absolutely correct. Though that's already too technical I think. I went at it from a calculation and math perspective rather than from programming. And the smaller numbers might as well be imprecise, which wouldn't matter since higher precision isn't needed.



If you demand respect or gratitude for your volunteer work, you're doing volunteering wrong.

Captain_Yuri said:
The thing that is really starting to urk me is that people are trying to spin it in a way to show that their console is more powerful than it really is. But it is soooo stupid cause FP16 can apply to all current gen hardware. So switch is 1.5TF, x1 is 2.6TF, ps4 is 3.6TF, Pro is 8.4TF and Scorpio is 12 TF and etc when you get PC into mind. So it ends up being the same in relative performance, just with bigger numbers.
Sighh

Actually not if you look in the post above you and my last post, it depends on the chip. You just cannot magically make a chip half percise to run faster. Depending on how they are made it may double the perfermance of FP16 instruction or it may run at the same speed. I use registers in my answer because that is how deep my knowledge goes about these things go, I'm sure there are other ways to speed of FP16 and FP32 instructions other ways. But a register for all intents and purposes of this explination can only run one instruction at a time. And depending on how the chip is made to run the FP32 instructions can influence if the chip experences a "speed boost" running thing at half-percision. For example some 32-bit instruction are run on two 16-bit registers. So if it is optimized to do so, if you put 16-bit instructions into this register you can put another instruction at the same time in the other register and thus "twice" the speed in this case. But there are 32-bit registers that will only do one instruction at a time no matter how small the instruction is. So just looking at terms of FLOPS and Full percision/Half percision doesn't tell the entire story.

FLOPS, like Hertz before it, is just an advertising go-to word that really has limited real world inpact.