I flew too close to the 1TF sun and my Switch wings melted.

Why is everyone quoting 176 Gflops (0.176Tflops) for the Wii U?

The Latte GPU found in the Wii U is widely believed to be the RV770 with 320 stream processors (ALUs) clocked at 550mhz. AMD HD4000/5000/6000 graphics series can perform 2 arithmetic logic unit operations per clock cycle. 320 ALUs x 550mhz x 2 ops/cycle = 352 Gflops or 0.352 Tflops. This has been covered in detail a long time ago:

http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed

Was there new information that proves the Wii U's GPU only has 160 ALUs?! The issue with Wii U's GPU is that it uses a VLIW-5 architecture which predates the scalar GCN and Maxwell/Pascal architectures. The problem with VLIW-4/5 was that game engines had to be coded specifically to schedule all the wavefronts for maximum utilization of the GPU. This proved too costly and time consuming. It's why 3rd party multiplats were doomed from day 1 on the Wii U. Since GCN has 2 asynchronous compute scheduling engines in the xbox 1 and 8 (!) in the PS4, the driver and the dynamic scheduler of XB1/PS4 was more capable of handling unpredictable game code of next gen games. Since it became too costly to make separate game engine optimizations for VLIW-5 of the Wii U, Latte's stream processors would remain idle for most of that console's life. An indication of just how difficult it became to optimize for HD4000-6000 VLIW architectures is evidenced by the fact that AMD stopped driver support for all of those lines not long after GCN HD7000 launched in 2012. This is another reason the load power usage on the Wii U in 3rd party games was low because the graphics card was underutilized. That is why the Wii U's 0.352Tflops would rarely translate in practice. Therefore, comparisons of Wii U's Tflops to Tegra X1 is completely misrepresentative.

It's not correct to compare aggregate graphical horsepower capabilities between completely different GPU architectures using Tflops since that assumes both GPU architectures are 100% ALU limited. The only reason the Tflops comparison works for Xbox One vs. PS4 is because HD7790 (XB1) and HD7850/7870 (PS4) are the same GCN 1.0 architecture. Even then, that comparison is just a coincidence. IIRC, XB1 has 16 ROPs, 48 TMUs, 768 stream processors against PS4's 32 ROPs, 72 TMUs and 1152 stream processors. The massive raster output units and texture mapping units advantage of Pitcairn GPU in PS4 is what allows it to run games are 1080p when XB1 is forced to drop down to 720-900p. Conversely, an RX480 is 50-60% faster in modern games compared to the HD7970Ghz but RX480's Tflops advantage is only 36% [2304 SPs x 1266mhz / (2048 SPs x 1050mhz)].

Moral of the story is you often cannot even directly compared 2 AMD Graphics Core Next cards based on Tflops, so how can you compare NV vs. AMD? You cannot!

Here afe more examples:

Flat out comparisons of 1.58Tflops Fermi GTX580 to only a 35-40% faster in games 3.2Tflops Kepler GTX680 have long proven that comparing different GPU architectures by only using arithmetic logic unit calculations is 100% flawed. This can be easily illustrated by the fact that GTX1080 with nearly 9Tflops of power is only 21-23% faster than a 6.5Tflops GTX1070.

I am not making any excuses for supposed 256 CUDA core, 16 TMU, 16 ROPs, 25.6 GB/sec Switch specs, but a lot of you are missing the forest from the trees that graphical capability is NOT necessarily only arithmetic bound. Besides shaders/ALUs/Stream Processors, there are rasterization unit(s), geometry units, ROPs, TMUs, delta color compression, L2 cache, static vs. dynamic compute scheduler, Asynchronous Compute, access to lower level APIs such as Vulkan, etc.

NV and AMD architectures absolutely CANNOT be compared accurately strictly from a TFlops perspective. ~9 Tflops GTX1080 is almost 50% faster in games than an 8.6Tflops Fury X. Again, I am in no way shape or form defending the Switch's specs, but simply pointing out that direct comparisons of PS4/Xbone as XYZ times more powerful than the Switch absolutely cannot be drawn solely on the Tflops figures. It's even more tricky since GCN has dynamic compute scheduler and 2-8 Asynchronous Compute Engines (think why Uncharted 4 is so good looking), but Tegra X1's Maxwell's architecture doesn't have either of these features. [Maxwell has a static scheduler and its Async Compute units are only active in the CUDA framework/eco-system -- they are disabled in games on all consumer Maxwell videocards]

The biggest obstacles to 3rd party support will be small user install base initially and launch timing. Even if the Switch were 5X more powerful, many of the games scheduled to launch in 2017-2018 have started their design with no intention of ever showing up on the Switch, regardless of its hardware capabilities. Developers and publishers have limited human capital and financial resources to be able to easily add a dedicated separate team just for the Switch, when we don't even know if the console will top 5-10M unit sales in 2017.

Existing User Log In

New User Registration

Nintendo Discussion - I flew too close to the 1TF sun and my Switch wings melted. - View Post

Recent Badges: