By using this site, you agree to our Privacy Policy and our Terms of Use. Close
Bofferbrauer2 said:

It doesn't matter much between FP16,32 or 64, as ARM is built with Integer in mind, hence why all the official benchmarks were Dhrystone. FP was thought to be the (almost exclusive) domain of coprocessors until the Cortex line and still isn't used more prominently, as it needs much more complex chips and instructions (which in turn would also strongly increase consumption).


CPU's in general are built primarily with integers at the forefront.
Even going back to the old Cyrix M2 chips... Or to something more recent as AMD's Bulldozer architecture, where AMD bundled 1 Floating Point unit between two cores, but had two integer units.

Bofferbrauer2 said:

I agree, on a per-Watt-basis, ARM should trump X86. However, the architecture runs into a TDP wall around 2.5 Ghz, meaning that at 3 Ghz or more an X86 chip would probably be less consuming than an ARM processor.

The TDP wall is also a manufacturing limitation as well.
ARM chip manufacturers tend to opt for transistors with better power characteristics at the expense of clockspeed, which is fair enough.

Bofferbrauer2 said:

To get past those 3Ghz they would need to lengthen the pipeline, which risks costing some IPC if they would need to lengthen too much (essentially what happened with the Pentium 4).


You don't need to lengthen the pipeline.
Netburst is probably a bad example considering what we have now anyway.
Willamatte had a 20 stage pipeline and never went past 2ghz, Coffee Lake has a 19 stage pipeline and can clock to 5ghz.

Prescott ended up lengthening the pipeline to 31 stages, yet on the IPC front was just as good/better than the Willamatte Pentium 4. - Why is that? Because pipeline length isn't everything.
How it works is when data is traveling down the pipeline and stalls... It is a much quicker turn around the shorter the pipeline is, so a 10 stage pipeline in theory should be 3x faster than a 30 stage pipeline when there is a stall and data has to be fetched again.
But it's never always that simple.

If that 10 stage pipeline doesn't have the data in cache, then it has to spend an inordinate amount of cycles to fetch that data from Ram, doesn't matter how many stages you have, it's going to be terrible.
So large and fast caches are vital.

Another aspect is of course... Branch Tree Prediction, where the chip can guess what the CPU needs ahead of time and gets it ready in the caches, this can have a degree of effectiveness depending how good your predictor is, Intel tends to have the advantage on this front compared to most others in the industry.

Same with Hyper-Threading, not every stage of a pipeline is actually being utilized at all times, so by firing up a second thread that can start utilizing those stages sitting idle, that helps bolster performance.

...And so much more.
In-fact a large commanding share of die-space on a chip isn't actually dedicated to processing, it's dedicated to keeping the processor fed.

Bofferbrauer2 said:

While on those tests ARM had to emulate X86, costing some performance, it still can only compete with the Atom N3450, which is also clocked slower than the Snapdragon 835 (Atom: 1.1 Ghz base 2.2 Ghz turbo; Snapdragon 1.9Ghz LITTLE 2.45Ghz big - and potentially can work together for 8 threads total against only 4 in the Atom) and get's trounced by a Core m3 6Y30 (900 Mhz base, 2.2 Ghz max. Singlecore turbo, 3.8W TDP), even in the native, non-emulated tests.

In other words, ARM still has a long way to go until it can keep up with x86 in power. But let's see how the new Cortex A75 and especially A76 Cores will perform, the Cryo 280 in the Snapdragon 835 are still based on the by now slightly outdated Cortex A73

Of course there is going to be a performance penalty.
But it is what it is.

Project Denver was actually going to be both an x86 and ARM chip, but the licensing couldn't be obtained, the way it worked was that the chip was going to reinterpret the instructions from either x86 or ARM and translate it into it's own internal instructions for processing.

Still... Cortex A75/A73 etc' isn't the fastest ARM chips anyway, Apple actually beats them out.



--::{PC Gaming Master Race}::--