| mutantsushi said: Saying the GPU is overpowered compared to the CPU kind of loses any meaning when GPGPU w/ unified memory is in the picture, doesn't it? |
Kind of
PS4 is capable of a feature that Cerny calls "asynchronous fine-grain compute". On older GPUs you have a command processor that has a graphics pipeline for rendering and a compute pipeline for GPGPU. You can only do either rendering or GPGPU efficiently, both at the same time is pretty ineffecient.
AMD's GCN GPUs have asynchronous compute engines (ACEs). Basically, these ACEs are like additional compute pipelines for GPGPU. You can use them instead of the compute pipeline in the command processor. As a result, the latter will be able to focus on maximum rendering performance. A HD7970 has 2 of these ACEs, it can take care of 2 compute jobs at the same time. PS4 has 8 ACEs and can take care of 8 compute jobs at the same time.
To minimize the overhead for graphics rendering, you want to split your compute load into as many small jobs as possible (fine-grain) and "feed" it to your shader cores when they're not fully utilized during rendering jobs (which happens all the time on modern GPUs). Therefore each ACE has compute queues. A HD7970 only has 2 queues (one queue for each ACE), a PS4 has 64 compute queues (8 queues for each ACE). That means programmers can queue up to 64 different compute jobs at the same time in PS4. The 8 ACEs will choose the right jobs at the right time based on pre-defined dependences and according to Cerny you will not notice a penalty for graphics rendering since the ACEs wait until some part of the GPU is underutilized.
PS4 has 8 ACEs/64 queues. AMD's HD8000 series GPUs have a maximum of 4 ACEs/ 32 queues. Xbox One has 2 ACEs with something between 2 or 16 queues. I can't tell for sure since Microsoft doesn't go into detail on this topic.
The problem for Xbox One is that PS4 has 500 GFLOPS more processing power in its GPU, anyway. PS4 will outperform its competitor very easily when it comes to GPGPU.














