bonzobanana said:
1. It's well documented the 360 CPU, I remembered it at 20,000 but its slightly below at 19,200 mips but certainly well beyond your estimates.
https://en.wikipedia.org/wiki/Instructions_per_second
2. Remember that 6 threads at 3.2ghz so even if the individual cores are weaker they run very fast.
The 360 also has excellent memory bandwidth too, 256GB/s for the 10MB of video memory, 25GB/s for main memory which is one of the factors that can slow down multicore cpu's when they are bandwidth starved. One thread is pretty much dedicated to the background operating system I believe.
3. PS3 is only 10,200 mips for its dual thread PPC core but of course is supported by the 7 cell processors which all run at 3.2ghz and boosts performance 3-4x when properly programmed. They also are extremely efficient at doing parallel tasks like enhancing the gpu feature set, decoding sound etc.
4. There is no way the Switch matches the CPU performance of those 2 consoles especially PS3 unless Nintendo unlocks the firmware which may not be possible for thermal and battery reasons.
5. ARM A57's are good but just 3 of them at 1ghz is not competitive surely and they maybe required to do additional processing like wifi, bluetooth etc.
6. I would put Switch in CPU terms between wii u and 360. It's probably like a 50% boost over wii u which is a huge difference but still noticeably weaker than 360 and PS3. Also mobile chipsets are prone to thermal throttling which you don't get on 360, PS3 or wii u. Although I think Nintendo has set the cpu's at 1ghz to prevent any thermal throttling, that would seem like a likely reason for the cap.
|
1. You say it is well-documented but you didn't provide said documentation. The Wikipedia page has no source for their Xenon and Cell statistics. Plus one must distinguish DMIPS (which are results from an actual benchmark called Dhrystone) from MIPS (which is a pretty useless measurement across architectures as Permalite noted earlier.)
2. Only if you can take advantage of six threads. Very few games do. There is only so much you can parallelize, and the more threads we're talking about the lower the returns. This is gaming, not video-editing.
3. Okay, but when we're talking about such a large difference in GPU power, why even care about the SPE's? These computations can just be done on the GPU. The SPE's were a hassle for PS3 development anyway, and many games suffered because of it.
4. I agree, there is no way. The Switch's CPU is definitely better for gaming.
5. This is mostly speculative/overly assertive on your part.
6. The switch's dock is there to prevent throttling. If throttling were a thing we'd notice when our Switch's got hot. Now there is likely throttling to reduce power-draw for less intensive games, but that is entirely dependent on the requirements of the game.
It might be valuable to read the quote that quickrick provided, cited from his random developer.
"Cell and Xenon are good in highly optimized SIMD code. Xenon = 3 cores at 3.2 GHz, four multiply-adds per cycle (76.8 GFLOP/s). That's significantly higher theoretical peak than the 4x ARM cores on Switch can achieve. But obviously it can never reach this peak. You can't assume that multiply-add is the most common instruction (see Broadwell vs Ryzen SIMD benchmarks for further proof). Also Xenon vector pipelines were very long, so you had to unroll huge loops to reach good perf with it. Branching and indexing based on vector math results was horrible (~40 cycle stall to move data between register files). ARM NEON is a much better instruction set and OoO and data prefetch helps even in SIMD code.
If you compare them in standard C/C++ game code, ARM and Jaguar both stomp over the old PPC cores. I remember that it was common consensus that the IPC in generic code was around 0.2. So both Jaguar and ARM should be 5x+ faster per clock than those PPC cores (IIRC Jaguar average IPC was around 1.0 in some real life code benchmark, this ARM core should be close). However you can also write low level optimized game code for PPC, so it all depends on how much resources you had to optimize and rewrite the code. Luckily those days are a thing of the past. I don't want to remember all those ugly hacks we had around the code base to make the code run "well enough". The most painful thing was that CPU didn't have a data prefetcher. So you had to know around 2000 cycles in advance which memory regions your future code is going to access, and prefetch that data to cache. If you didn't do this, you would get 600 cycle stalls on memory loads. Those PPC cores couldn't even prefetch linear arrays."
Last edited by sc94597 - on 28 January 2018