By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Pemalite said:
EpicRandy said:

A great comparison I think is to compare the Series S to the 6800U. The 6800U have peak GPU (680m) performance of 3.68tfs which is very close to that of the Series s 4tfs. However the 680m is limited by 16 cu which force it to run at higher clocks with added inefficiency. The 680m is also Rdna3 and the video suggest that such device form MS would use Rdna4 with better efficiency. The 6800U is also used in similar device to what is suggested of a Series S handheld like the Ayaneo 2. 

Teraflops is a garbage Metric. Don't use it.

Less CU's and higher clocks isn't inefficient, it can actually result in better performance per watt.

Let's take the Vega integrated graphics for example on AMD's APU's...

My old Laptop with a Ryzen 2700U @25w TDP verses my other old laptop with a 4500u @ 25w TDP.
They are both based on Vega graphics.

The 4500U
* Vega 6CU's @1,500mhz. - 1.15 Teraflops.

The 2700U
* Vega 10CU's @1,300Mhz. - 1.64 Teraflops.

On paper the 2700u should own gaming performance. - Same graphics architecture, more CU's at a lower clock. Same TDP.

Yet, the 4500u in real world gaming will always win. - Why? It's a balancing act, CU's consume power, clockspeeds consume power, there is an inherent efficiency curve in all processing architectures, where you get the most performance per watt at a given clockrate.

AMD through several generations of trial and error determined that higher clockspeeds can provide more performance even with less CU's... Provided other bottlenecks are also removed like Bandwidth limitations.

I would even pick something like the 5500u over the 4700u, same bandwidth, same CU count, same TDP, but the 5500U has far better GPU performance thanks to just the higher clocks.

I get what you say and kind of good advice in general but it's a little more complicated than this. Tflops measure the limits in compute power a chip can have at corresponding clocks. It is actually very precise to determine the capacity of a chip in a nutshell. However, this is only 1 part of the story the others can all be summed up by what % you can actually use in any given scenario or in other words, how starved the chip actually is. It can be starved with an insufficient memory pool, insufficient memory bandwidth, and insufficient power delivery. 

Another aspect to consider is the amount and variety of hardware acceleration you have that may result in software bypassing the utilization of CU in some scenarios where it would need to use them on a chip without such acceleration.

in the example you gave the 2700U is actually very starved by the lack of memory pool and bandwidth, The 4500U feature 50% more L2 cache and 2x L3 cache and supports higher memory frequency. The Cpu side is also more power-hungry with the 2700U than the 4500u leaving more leeway for the GPU on the 4500u to use the same 25W TDP.

However were you to use the vega10 featured in the 2700u@1300mhz and the vega6 of the 4500u@1500mhz in a context where there are no bottlenecks the Vega 10 would result in better performance than the vega6 as their respective max capacity would be the actual bottleneck. 

For the 5500u vs 4700u, the sole difference is that the Cpu side is less power hungry with the 5500U allowing for higher clocks on the GPU, but make no mistake if you were to isolate the GPU power consumption and compare them both the 4700U would actually be more efficient per watt. Even the more recent rdna2 is most efficient at around 1300 to 1400 MHZ according to this source. The Vega architecture however had a much lower most efficiency clock speed, I could not find a source for this but I remember at the time of the Vega announcement that AMD was using clocks of ~850MHZ in their presentation to portray efficiency increase compared with the older architecture. This was prior to the reveal of the Vega 56 and 64 however so it is possible that it was tested on engineering samples. This may have shifted with node shrinkage also, but could not find anything on this, still really doubt 1800mhz would be more efficient per watt than 1500mhz with the Vega architecture.

All that said, however, MS having access to semi-custom architecture and being designed for a gaming-first application they would make sure the GPU power (Tflops) is actually the bottleneck in the vast majority of scenarios so as not to create unnecessary starvation of the GPU. In such a scenario, where GPU power (Tflops) is the bottleneck and on the same architecture, then it is relevant and accurate to a relatively high degree, to use such value as a point of comparison. if they were to switch to rdna3 or rdna4 instead of just shrinking the fab nods of their semi-custom rdna2 then it would just be icing on the cake as it will feature more hardware acceleration possibilities.

Last edited by EpicRandy - on 14 April 2023