By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Gaming Discussion - Could the Xbox Series S be MS entry into a Handheld Device

Machiavellian said:
curl-6 said:

I'm no expert, but I'm pretty sure it's not as simple as just die shrinking a chipset designed for a home console and stuffing it into a handheld. 

The demands of a portable system are quite different than a big box plugged into the mains. You can't just use the same RAM, for instance. You'd have to change the hardware to the point where it wouldn't be "the same device" any more.

Review the video I posted as the person goes into great detail on that subject. 

The device he describes wouldn't be 1:1 with a Series S as he admits changes would be necessary.

It would also have to compete against Switch 2, and assuming a late 2024 release as he claims, would have missed out on half the generation by the time it releases and get maybe 4 years before Xbox moves on to the next generation. 

It's definitely possible, don't get me wrong, but I'm still not entirely sold. Plus the guy honestly came of as rather full of himself.

Last edited by curl-6 - on 19 April 2023

Around the Network
Norion said:

Easily conveying power difference to people is useful which teraflops can do in the right circumstances since while there is inaccuracy it does still give a general idea of the power gap between the two. The inaccuracy does still make it a bad metric though so what metric would you suggest be used instead?

Framerates tends to be a good one.

EpicRandy said:

Tflops is *not* actually a measurement of anything.

yes it is, it's not some kind of metric obtained with a dice roll when a new GPU enters the markets, each shader core (AMD) and CUDA core (Nvidia), which are responsible for the general floating operation on their respective GPU, can do up to 2 FLOPs per clock for 32-bit floating-point operations. That's the physical limits of those cores. Saying TFLOPS is not worth anything is akin to saying HP means nothing to cars, and wattage means nothing to electric motors. Every single metric of about everything simple or complex is only as good as you can contextualize it.

No. It's actually not.

It's a bunch of numbers multiplied together. - It's theoretical, not real world.

Again, no GPU or CPU will ever achieve their "hypothetical teraflops" in the real world.

EpicRandy said:

It's a theoretical number based on a number of hardware attributes and not a measurement of capability. - It is a number that is impossible to achieve in the real world.

Yes like I explained earlier it is theoretical cause you could never 100% task those cores. So it can be viewed both as a theoretical limit or a physical barrier you could never exceed or even attain at the reference clock, but it still is very much a measurement of capability.

But by that same vein, I could grab a 1 teraflop "rated" GPU and assert it's theoretically capable of "1.2 Teraflops" based on any number of factors.

It's meaningless, because it's unachievable in any real world scenario.

EpicRandy said:

A Radeon 5870 is 2.72 Teraflops GPU with 2GB @153GB/s of bandwidth.
A Radeon 7850 is 1.76 Teraflops GPU with 2GB Ram @153GB/s of bandwidth.

So the only real difference is almost 1 Teraflops of compute, right? It's accurate according to you right? So the Radeon 5870 should win right?

Then if it's such an accurate measure of compute, why is the 7850 faster in everything, including compute where in some single precision floating point tasks, the 7850 is sometimes more than twice as fast?
(But don't take my word for it)
https://www.anandtech.com/bench/product/1062?vs=1076

There's is another very significant difference between the 2. When I made my list of what could starve GPU cores 'insufficient memory pool, insufficient memory bandwidth, and insufficient power delivery' I did not mean it to be exhaustive. Here it's the flaws of Terascale architecture that starve the cores of the high-end 5870. No matter the gen, architecture, or revision the high-ends/enthusiasts segments are meant to push limits by sacrificing efficiency to get the last drops of performance, so should always be viewed with high diminishing returns in mind. The 7850 is a mid-range GPU using the better GCN architecture which resulted in significantly less starvation of its core.

What is starving the 5870 to have less real-world teraflops than the 7850?

Explain it. I'll wait.

EpicRandy said:

You are just confirming my point, that the number of CU's is not the be-all, end-all.

it's confirming my point too cause I never claimed the opposite either and using TFLOPS in certain scenarios does not mean I view it as a be-all, end-all either. In fact, all my statement points to carefulness when using Tflops, and using CU's would only be worse, so I don't know why you try to claim the opposite as my position. 

Using CU's alone is just as irrelevant as Teraflops. And I would never condone or support such a thing.

Both are bullshit.

EpicRandy said:

Nah. Isolating GPU power consumption doesn't result in higher GPU power consumption.

Remember binning is actually a thing and as a process matures you can obtain higher clockspeeds without a corresponding increase to power consumption, sometimes... You can achieve higher clocks -and- lower power consumption as processes mature.

I think you misunderstood what I was trying to say. The default TDP is for the whole APU so the 5500u having a less power-hungry CPU means the GPU has more available power hence the 200mhz higher clock frequency. The rdna2 (just for comparison as I did not find an equivalent chart for vega) architecture shows a 25% increase in W for 1800mhz compared to 1600mhz which is a 12.5% increase in performance. No doubt the Vega architecture is not as good as that either so the ratio may even be worse. Binning is a thing but the best bins would be reserved for the higher tier with better profit margins like 4800U and 4980U and have only a marginal impact nothing of the sort to bridge a 12.5% performance/watt gap.

And binning does not always end up being used for power saving, look at the RX 400s vs RX 500s, the difference was only better bins but they used it to get higher clocks (about 6%) since they pushed the architecture to the max it's actually resulted in worse performance/watt using 23% more W for the 580 vs 480.

Anyway, this whole conversation is weird cause all your points do not disprove the initial context in which I used the TFLOPS figure. I said that AMD/MS needs to attain the 4TFLOPS envelop of the RDNA2 architecture with mobile-like TDP which I pointed out to another RDNA2 (which I mistakenly write rdna3 in my previous post, sorry if it's the source of the debate) chip with close-matched TFLOPS, I did not claim their performance was equivalent or comparable based on this, it was supposed to mean that AMD already successfully reduce the TDP envelope of the RDNA2 architecture with the change from 7nm to 6nm. If I contextualized things more, the 680M is a max 50w TDP but boasts 80%+ of its performance at 25W, see benchmark here. This is promising when you consider the semi-custom design of the Series S is even more efficient by using more cores at lower clock speeds. So it only adds to the plausibility of the video in the OP. RDNA2 at 4nm or even 3nm should be more than enough to push a 4TFLOPS rdna2 package under 25w and even have a shot at a 15w APU target.

TFLOPS is also very useful in this context because consoles must keep this metric when doing a die shrink. Look at the PS5, it now uses the 6nm Oberon plus process and shaved off 20w to 30w but kept the same TFLOPS target (the same clock speed and the same number of shader cores), same memory, same bandwidth, same everything but shrunk down. They have to do it this way to keep changes invisible to developers, and that's basically what I anticipate Xbox to do with the series consoles whether or not they want a revised S in a switch-like format. If MS were to use a different architecture like RDNA 3 or 4 it is unlikely they would use a different TFLOPS target either if they want to keep things invisible to devs (it may not even be possible here but the more you keep the same the easier it should be for a dev to create a new build from the S version) 

My argument is that TFLOPS is bullshit in using it to determine the capability of a GPU.

Nor are higher clocks always a detriment to power consumption... It's a balancing act as all CPU's and GPU's have an efficiency curve.

For example... You can buy a CPU, unvolt it... Then overclock it... And result in a CPU that uses less power, but offers higher performance due to it's higher clockrate.



--::{PC Gaming Master Race}::--

There is no 100% reliable way to determine a system's true capability. Personally, I let the real world results decide this (framerate, resolution, settings etc), but a lot people would automatically blame "optimization" when their platform of choice underperforms, and some would assume platform holders are paying or forcing 3rd party developers to not optimize for the rival platform lol.



Pemalite said:
Norion said:

Easily conveying power difference to people is useful which teraflops can do in the right circumstances since while there is inaccuracy it does still give a general idea of the power gap between the two. The inaccuracy does still make it a bad metric though so what metric would you suggest be used instead?

Framerates tends to be a good one.

That works well enough for PC but the issue with that metric for consoles is to people not informed they could assume that if a game is locked to 30fps on the last gen systems and is locked to 60fps on the current gen ones that means they're twice as powerful when the gap is far bigger than that. There's also how the Series S can run a game at the same fps as the Series X so the additional metric of resolution would have to be used as well complicating things further. 

Unless it's simple enough to where someone with little to no technical knowledge can understand it as easily as 4 vs 12 it's unfortunately not really viable for stuff like the showcases produced by Microsoft and Sony. They don't wanna potentially confuse a large segment of people watching.



Pemalite said:
Norion said:

Easily conveying power difference to people is useful which teraflops can do in the right circumstances since while there is inaccuracy it does still give a general idea of the power gap between the two. The inaccuracy does still make it a bad metric though so what metric would you suggest be used instead?

Framerates tends to be a good one.

Well yeah, I agree, but there are still many flaws with this.

  1. FPS can only measure already available cards and serve no purpose in evaluating future ones.
  2. It still needs contextualization, what game was running, what was the target resolutions, what was all the post-processing effects, what API was used, and what engine was used.
  3. 2 GPUs may perform almost exactly the same on 1 title yet vastly differently on another.
  4. It is very susceptible to manipulation
    1. Nvidia, through their partner programs with devs, make certain they use sometimes unnecessary features or level of utilization of a feature to impact performance on AMD GPU e.g. the wither 3 use of x64 tessellation with Hairworks design to cripple performance on AMD GPU with no image fidelity gain (after x16).
    2. anandtech: "Let's start with the obvious. NVIDIA is more aggressive than AMD with trying to get review sites to use certain games and even make certain GPU comparisons."
    3. Back not too long ago Nvidia drivers revision had a tendency to decrease GPU performance while AMD revisions increased their performances over the lifetime of the GPUs. When new gens were announced Nvidia compared the latest drivers' performances of their old gen to the new gen showing skewed comparisons. 
    4. AMD was caught using blatantly wrong number when displaying FPS improvement gen over gen with the RX 7900 XTX
  5. When gaming, many kinds of workloads are computed all at once, some that have better utilization of the GPU than others, but the shown performance of the GPU in fps will only rise to that of the least performing one.
  6. For certain workloads, average FPS would literally be a trash figure while Tflops will depict things more accurately.
Pemalite said:
EpicRandy said:

Tflops is *not* actually a measurement of anything.

yes it is, it's not some kind of metric obtained with a dice roll when a new GPU enters the markets, each shader core (AMD) and CUDA core (Nvidia), which are responsible for the general floating operation on their respective GPU, can do up to 2 FLOPs per clock for 32-bit floating-point operations. That's the physical limits of those cores. Saying TFLOPS is not worth anything is akin to saying HP means nothing to cars, and wattage means nothing to electric motors. Every single metric of about everything simple or complex is only as good as you can contextualize it.

No. It's actually not.

It's a bunch of numbers multiplied together. - It's theoretical, not real world.

Again, no GPU or CPU will ever achieve their "hypothetical teraflops" in the real world.

Those multiplication represent literally how to stream processor works, they can perform 2 flops per cycle by design.
That much is not theoretical, processors require a high and low signal aka clock cycle to operate and stream processors are mostly designed to run 2 FP32 instructions per clock. That's not a theory that's how they work. Some workloads such as scientific simulations, machine learning, and data analytics have better utilization and are sometimes close to 100%.

Pemalite said:
EpicRandy said:

It's a theoretical number based on a number of hardware attributes and not a measurement of capability. - It is a number that is impossible to achieve in the real world.

Yes like I explained earlier it is theoretical cause you could never 100% task those cores. So it can be viewed both as a theoretical limit or a physical barrier you could never exceed or even attain at the reference clock, but it still is very much a measurement of capability.

But by that same vein, I could grab a 1 teraflop "rated" GPU and assert it's theoretically capable of "1.2 Teraflops" based on any number of factors.

It's meaningless, because it's unachievable in any real world scenario.

No, I literally wrote, "physical barrier you could never exceed or even attain". So unless you are able to overclock it by %20 that's not possible.

Some real-world scenarios get really close to 100%, just not gaming in general but it's already better with consoles due to static hardware and specific optimization.

Pemalite said:
EpicRandy said:

A Radeon 5870 is 2.72 Teraflops GPU with 2GB @153GB/s of bandwidth.
A Radeon 7850 is 1.76 Teraflops GPU with 2GB Ram @153GB/s of bandwidth.

So the only real difference is almost 1 Teraflops of compute, right? It's accurate according to you right? So the Radeon 5870 should win right?

Then if it's such an accurate measure of compute, why is the 7850 faster in everything, including compute where in some single precision floating point tasks, the 7850 is sometimes more than twice as fast?
(But don't take my word for it)
https://www.anandtech.com/bench/product/1062?vs=1076

There's is another very significant difference between the 2. When I made my list of what could starve GPU cores 'insufficient memory pool, insufficient memory bandwidth, and insufficient power delivery' I did not mean it to be exhaustive. Here it's the flaws of Terascale architecture that starve the cores of the high-end 5870. No matter the gen, architecture, or revision the high-ends/enthusiasts segments are meant to push limits by sacrificing efficiency to get the last drops of performance, so should always be viewed with high diminishing returns in mind. The 7850 is a mid-range GPU using the better GCN architecture which resulted in significantly less starvation of its core.

What is starving the 5870 to have less real-world teraflops than the 7850?

Explain it. I'll wait.

Sure here's a very good read on the utilization of the TeraScale architecture: 

Utilization remains a big concern though, for both the SPUs and the SPs within them: not only must the compiler do its best to identify 5 independent datapoints for each VLIW thread, but so must 64 VLIW threads be packed together within each wavefront. Further, the 64 items in a wavefront should all execute against the same instruction; imagine a scenario wherein one thread executes against an entirely different instruction from the other 63! Opportunities for additional clock cycles & poor utilization thus abound and the compiler must do it’s best to schedule around them.

With 5 SPs in each SPU, attaining 100% utilization necessitates five datapoints per VLIW thread. That’s the best case; in the worst case an entire thread is comprised of just a single datapoint resulting in an abysmal 20% utilization as 4 SPs simply engage in idle chit-chat. Extremities aside, AMD noted an average utilization of 68% or 3.4 SPs per clock cycle. A diagram from AnandTech’s GCN preview article depicts this scenario, and it’s a good time to borrow it here:

The HD 6900 series would serve as the last of the flagship TeraScale GPUs, even as TeraScale based cards continued to release until October of 2013. As compute applications began to take center-stage for GPU acceleration, games too evolved. The next generation of graphics API’s such as DirectX 10 brought along complex shaders that made the VLIW-centric design of TeraScale ever more inefficient and impractically difficult to schedule for. The Radeon HD 7000 series would accordingly usher in the GCN architecture, TeraScale’s inevitable successor that would abandon VLIW and ILP entirely and in doing so cement AMD’s focus on GPU compute going forward.

Pemalite said:
EpicRandy said:

You are just confirming my point, that the number of CU's is not the be-all, end-all.

it's confirming my point too cause I never claimed the opposite either and using TFLOPS in certain scenarios does not mean I view it as a be-all, end-all either. In fact, all my statement points to carefulness when using Tflops, and using CU's would only be worse, so I don't know why you try to claim the opposite as my position. 

Using CU's alone is just as irrelevant as Teraflops. And I would never condone or support such a thing.

Both are bullshit.

Cu's are just a name associated with a complex of cores/controller/l1 caches etc... Those truly don't mean anything unless you specify the architecture and revision as they are built differently from one architecture to another and from revisions to another. Teraflops represent the same thing notwithstanding the architecture/revision.

Pemalite said:
EpicRandy said:

Nah. Isolating GPU power consumption doesn't result in higher GPU power consumption.

Remember binning is actually a thing and as a process matures you can obtain higher clockspeeds without a corresponding increase to power consumption, sometimes... You can achieve higher clocks -and- lower power consumption as processes mature.

I think you misunderstood what I was trying to say. The default TDP is for the whole APU so the 5500u having a less power-hungry CPU means the GPU has more available power hence the 200mhz higher clock frequency. The rdna2 (just for comparison as I did not find an equivalent chart for vega) architecture shows a 25% increase in W for 1800mhz compared to 1600mhz which is a 12.5% increase in performance. No doubt the Vega architecture is not as good as that either so the ratio may even be worse. Binning is a thing but the best bins would be reserved for the higher tier with better profit margins like 4800U and 4980U and have only a marginal impact nothing of the sort to bridge a 12.5% performance/watt gap.

And binning does not always end up being used for power saving, look at the RX 400s vs RX 500s, the difference was only better bins but they used it to get higher clocks (about 6%) since they pushed the architecture to the max it's actually resulted in worse performance/watt using 23% more W for the 580 vs 480.

Anyway, this whole conversation is weird cause all your points do not disprove the initial context in which I used the TFLOPS figure. I said that AMD/MS needs to attain the 4TFLOPS envelop of the RDNA2 architecture with mobile-like TDP which I pointed out to another RDNA2 (which I mistakenly write rdna3 in my previous post, sorry if it's the source of the debate) chip with close-matched TFLOPS, I did not claim their performance was equivalent or comparable based on this, it was supposed to mean that AMD already successfully reduce the TDP envelope of the RDNA2 architecture with the change from 7nm to 6nm. If I contextualized things more, the 680M is a max 50w TDP but boasts 80%+ of its performance at 25W, see benchmark here. This is promising when you consider the semi-custom design of the Series S is even more efficient by using more cores at lower clock speeds. So it only adds to the plausibility of the video in the OP. RDNA2 at 4nm or even 3nm should be more than enough to push a 4TFLOPS rdna2 package under 25w and even have a shot at a 15w APU target.

TFLOPS is also very useful in this context because consoles must keep this metric when doing a die shrink. Look at the PS5, it now uses the 6nm Oberon plus process and shaved off 20w to 30w but kept the same TFLOPS target (the same clock speed and the same number of shader cores), same memory, same bandwidth, same everything but shrunk down. They have to do it this way to keep changes invisible to developers, and that's basically what I anticipate Xbox to do with the series consoles whether or not they want a revised S in a switch-like format. If MS were to use a different architecture like RDNA 3 or 4 it is unlikely they would use a different TFLOPS target either if they want to keep things invisible to devs (it may not even be possible here but the more you keep the same the easier it should be for a dev to create a new build from the S version) 

My argument is that TFLOPS is bullshit in using it to determine the capability of a GPU.

Nor are higher clocks always a detriment to power consumption... It's a balancing act as all CPU's and GPU's have an efficiency curve.

For example... You can buy a CPU, unvolt it... Then overclock it... And result in a CPU that uses less power, but offers higher performance due to it's higher clockrate.

Yes, I know all that but the efficiency curve gets exponentially worse with clocks past a certain speed and GPUs default clocks are always already past that point, binning only has marginal impacts here. You're able to under-volt some GPUs only because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare.

Last edited by EpicRandy - on 19 April 2023

Around the Network

What I think is that MS is having a hard time against Sony on console in all generations even after having all this experience, how would they hope to compete with Nintendo on their first try?



duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Azzanation: "PS5 wouldn't sold out at launch without scalpers."

DonFerrari said:

What I think is that MS is having a hard time against Sony on console in all generations even after having all this experience, how would they hope to compete with Nintendo on their first try?

They are already in competition with Nintendo. Adding a switch like experience to their offering would not change anything in that regards. The true question is would that add anything significant to their offering? For me the answer is yes as it would also synergize well with their service offering.



DonFerrari said:

What I think is that MS is having a hard time against Sony on console in all generations even after having all this experience, how would they hope to compete with Nintendo on their first try?

Its not about competing with Switch, its about offering a mobile device to fill a gap in their product lineup.  If you are going to get a Series S for whatever reason having it also as a mobile device probably could seal the deal better and improve sales of the device.



EpicRandy said:
DonFerrari said:

What I think is that MS is having a hard time against Sony on console in all generations even after having all this experience, how would they hope to compete with Nintendo on their first try?

They are already in competition with Nintendo. Adding a switch like experience to their offering would not change anything in that regards. The true question is would that add anything significant to their offering? For me the answer is yes as it would also synergize well with their service offering.

Actually nope, neither MS (which says Sony and Nintendo aren't their real competitors since this gen) nor Nintendo (who says they aren't competing with Sony and MS since Wii) recognizes one another as competitor, and MS have never competed in portable console it isn't really just "owww make the same system in your table be held in your hand".

Machiavellian said:
DonFerrari said:

What I think is that MS is having a hard time against Sony on console in all generations even after having all this experience, how would they hope to compete with Nintendo on their first try?

Its not about competing with Switch, its about offering a mobile device to fill a gap in their product lineup.  If you are going to get a Series S for whatever reason having it also as a mobile device probably could seal the deal better and improve sales of the device.

Considering their games are already available portably on SteamDeck and others and that they aren't even caring that much to improve sales on console hardware I really don't see much reason to try portable as a new stream venue.



duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Azzanation: "PS5 wouldn't sold out at launch without scalpers."

EpicRandy said:
Pemalite said:

Framerates tends to be a good one.

Well yeah, I agree, but there are still many flaws with this.

  1. FPS can only measure already available cards and serve no purpose in evaluating future ones.
  2. It still needs contextualization, what game was running, what was the target resolutions, what was all the post-processing effects, what API was used, and what engine was used.
  3. 2 GPUs may perform almost exactly the same on 1 title yet vastly differently on another.
  4. It is very susceptible to manipulation
    1. Nvidia, through their partner programs with devs, make certain they use sometimes unnecessary features or level of utilization of a feature to impact performance on AMD GPU e.g. the wither 3 use of x64 tessellation with Hairworks design to cripple performance on AMD GPU with no image fidelity gain (after x16).
    2. anandtech: "Let's start with the obvious. NVIDIA is more aggressive than AMD with trying to get review sites to use certain games and even make certain GPU comparisons."
    3. Back not too long ago Nvidia drivers revision had a tendency to decrease GPU performance while AMD revisions increased their performances over the lifetime of the GPUs. When new gens were announced Nvidia compared the latest drivers' performances of their old gen to the new gen showing skewed comparisons. 
    4. AMD was caught using blatantly wrong number when displaying FPS improvement gen over gen with the RX 7900 XTX
  5. When gaming, many kinds of workloads are computed all at once, some that have better utilization of the GPU than others, but the shown performance of the GPU in fps will only rise to that of the least performing one.
  6. For certain workloads, average FPS would literally be a trash figure while Tflops will depict things more accurately.
Pemalite said:

No. It's actually not.

It's a bunch of numbers multiplied together. - It's theoretical, not real world.

Again, no GPU or CPU will ever achieve their "hypothetical teraflops" in the real world.

Those multiplication represent literally how to stream processor works, they can perform 2 flops per cycle by design.
That much is not theoretical, processors require a high and low signal aka clock cycle to operate and stream processors are mostly designed to run 2 FP32 instructions per clock. That's not a theory that's how they work. Some workloads such as scientific simulations, machine learning, and data analytics have better utilization and are sometimes close to 100%.

Pemalite said:

But by that same vein, I could grab a 1 teraflop "rated" GPU and assert it's theoretically capable of "1.2 Teraflops" based on any number of factors.

It's meaningless, because it's unachievable in any real world scenario.

No, I literally wrote, "physical barrier you could never exceed or even attain". So unless you are able to overclock it by %20 that's not possible.

Some real-world scenarios get really close to 100%, just not gaming in general but it's already better with consoles due to static hardware and specific optimization.

Pemalite said:

What is starving the 5870 to have less real-world teraflops than the 7850?

Explain it. I'll wait.

Sure here's a very good read on the utilization of the TeraScale architecture: 

Utilization remains a big concern though, for both the SPUs and the SPs within them: not only must the compiler do its best to identify 5 independent datapoints for each VLIW thread, but so must 64 VLIW threads be packed together within each wavefront. Further, the 64 items in a wavefront should all execute against the same instruction; imagine a scenario wherein one thread executes against an entirely different instruction from the other 63! Opportunities for additional clock cycles & poor utilization thus abound and the compiler must do it’s best to schedule around them.

With 5 SPs in each SPU, attaining 100% utilization necessitates five datapoints per VLIW thread. That’s the best case; in the worst case an entire thread is comprised of just a single datapoint resulting in an abysmal 20% utilization as 4 SPs simply engage in idle chit-chat. Extremities aside, AMD noted an average utilization of 68% or 3.4 SPs per clock cycle. A diagram from AnandTech’s GCN preview article depicts this scenario, and it’s a good time to borrow it here:

The HD 6900 series would serve as the last of the flagship TeraScale GPUs, even as TeraScale based cards continued to release until October of 2013. As compute applications began to take center-stage for GPU acceleration, games too evolved. The next generation of graphics API’s such as DirectX 10 brought along complex shaders that made the VLIW-centric design of TeraScale ever more inefficient and impractically difficult to schedule for. The Radeon HD 7000 series would accordingly usher in the GCN architecture, TeraScale’s inevitable successor that would abandon VLIW and ILP entirely and in doing so cement AMD’s focus on GPU compute going forward.

Pemalite said:

Using CU's alone is just as irrelevant as Teraflops. And I would never condone or support such a thing.

Both are bullshit.

Cu's are just a name associated with a complex of cores/controller/l1 caches etc... Those truly don't mean anything unless you specify the architecture and revision as they are built differently from one architecture to another and from revisions to another. Teraflops represent the same thing notwithstanding the architecture/revision.

Pemalite said:

My argument is that TFLOPS is bullshit in using it to determine the capability of a GPU.

Nor are higher clocks always a detriment to power consumption... It's a balancing act as all CPU's and GPU's have an efficiency curve.

For example... You can buy a CPU, unvolt it... Then overclock it... And result in a CPU that uses less power, but offers higher performance due to it's higher clockrate.

Yes, I know all that but the efficiency curve gets exponentially worse with clocks past a certain speed and GPUs default clocks are always already past that point, binning only has marginal impacts here. You're able to under-volt some GPUs only because their configurations are designed to cater to the worst binning offender of a particular SKU with some headroom to spare.

I would say your analogy with cars already get the idea across.

Of course one car having 1000hp and another car having 900hp doesn't mean much when there are several other design elements that will impact the performance of that car from simple 0-100km/h (0-60mph) to time to do a lap (which them can even be affected by the driver itself on exactly same car and conditions either putting 2 drivers to do lap or even same driver doing multiple laps they will be different time).

So would that mean measuring HP as totally useless? Absolutely not =p



duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about $50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Azzanation: "PS5 wouldn't sold out at launch without scalpers."