By using this site, you agree to our Privacy Policy and our Terms of Use. Close
sc94597 said:
quickrick said:

1. it's impossible to know how many things a game is doing under the hood. hard to compare games until they are both on the same systems, but just using common sense, making a living  breathing realistic city with traffic is gonna be way more demanding then a open world cell shaded game in the forest. 2. as for  the developer he says there is way more traffic in the next gen versions, 3. which would be too taxing for switch CPU. 4. he also compares 360/ps3 cpu vs switch. t would be highly dependent on the code they are running.


 "Cell and Xenon are good in highly optimized SIMD code. Xenon = 3 cores at 3.2 GHz, four multiply-adds per cycle (76.8 GFLOP/s). That's significantly higher theoretical peak than the 4x ARM cores on Switch can achieve. But obviously it can never reach this peak. You can't assume that multiply-add is the most common instruction (see Broadwell vs Ryzen SIMD benchmarks for further proof). Also Xenon vector pipelines were very long, so you had to unroll huge loops to reach good perf with it. Branching and indexing based on vector math results was horrible (~40 cycle stall to move data between register files). ARM NEON is a much better instruction set and OoO and data prefetch helps even in SIMD code.

If you compare them in standard C/C++ game code, ARM and Jaguar both stomp over the old PPC cores. I remember that it was common consensus that the IPC in generic code was around 0.2. So both Jaguar and ARM should be 5x+ faster per clock than those PPC cores (IIRC Jaguar average IPC was around 1.0 in some real life code benchmark, this ARM core should be close). However you can also write low level optimized game code for PPC, so it all depends on how much resources you had to optimize and rewrite the code. Luckily those days are a thing of the past. I don't want to remember all those ugly hacks we had around the code base to make the code run "well enough". The most painful thing was that CPU didn't have a data prefetcher. So you had to know around 2000 cycles in advance which memory regions your future code is going to access, and prefetch that data to cache. If you didn't do this, you would get 600 cycle stalls on memory loads. Those PPC cores couldn't even prefetch linear arrays."

1. This is not necessarily clear to be honest. In some ways it might be more demanding, in other's it might not be, and we need to consider the whole picture not just "traffic" which is the point I was trying to get across. There are points in BOTW that are understandably demanding, because there are many simulations going on at once, just like with GTA V. The "city" thing is what is tricking you here. It's not as if GTA V is simulating the city and then popping you into the middle of it. It runs the city on a very deterministic path, only modifying that path as you interact with it. 

2. Okay, even if that is true it's irrelevant to the discussion, and actually tells us how scalable GTA V is as a game. I'd also like to point out that this strongly hints that GTA V's traffic "simulation" isn't all that complex. If they can increase the traffic count without too many unforeseen consequences that implies a somewhat simple and deterministic model. 

3. Sure, nobody expects the Switch to run the PS4/XBO versions 1:1, but that they increased the traffic count shows that a hypothetical Switch version might be scaled in between PS360 and PS4/XBO, which is where its CPU places it. 

4. Everything he described is correct, but I'd like to contextualize it. The Xbox 360 and PS3, when running very specific unstructured code, can outperform the ARM and even in the Cell's case the Jaguars found in the PS4 and XBO for very specific tasks. It is why you always get people randomly saying, "the cell is better than the PS4's CPU!" because of that benchmark which compares their relative floating point performances. Really it is only better at that particular type of code though: highly parallel vectorized floating point operations. When you create a game though, you aren't just doing floating point operations, and you're definitely not running code that is always easily parellized/vectorized. Of course if you rewrote things in a certain way with assembly or C level code (rather than standard C++ or C#) you could maximize the amount of code which incorporates these operations, but why bother? It's a waste of time and money for multiplatform releases, and you're not even sure that your programmers would even want to do this. It would be very hard and tedious work. GPU's can perform these operations much better than any CPU can, so if the Switch or PS4/XBO's CPU really needed that much more performance for these operations, one could just offset it to the GPU and still get better results without having to get into low-level programming. 

Some people on Neogaf have actually done some benchmarks and calculated an approximate idea of how the Switch's CPU compares to the PS4's in terms of real-world performance. 

https://www.neogaf.com/threads/what-is-the-actual-power-of-the-nintendo-switch.1379817/page-9

"Blu and I did some math with some benchmarks. The Switch's CPU came up with being roughly 80% of the performance of the PS4 per core. When we consider the dimishing returns of splitting tasks among more than 3 cores and not knowing how much of the 4th core is available for Switch devs, we may be looking at something like 50% of the CPU performance of the PS4 at full utilization."

I don't know well see, if GTAV comes out  on switch, but Noir didn't fair to well, it had worse frame rate then the ps3 version.