NJ5 said:
Not if you pipeline the operations (which you can do by dividing these 3 serial operations into further serial operations, the same way they passed from 2 to 3). Massive pipelining is the easy and scalable way of doing multi-core programs, but of course not good enough when response time is important. Your last sentence is very true, but parallelizing the steps is the hard part as you know. That's probably where future optimization efforts will be focused, along with low-level code optimizations.
|
I know. But look at slides 4 and 5. While with only 1 CPU you have a 2 frames pipeline, the GPU is starved for a good fraction of the time (depending on the cpu times/render time ratio). In slide 5 you have 3 frames but potentially much bigger framerate. Thus the response time, if you were constrained by your CPU, went down.
The same can be said of further pipelining: you will have diminishing returns as you add stages, but it can still be a win for response time if the CPUs were the main constraining factor. And you parallelize as much as you can without adding new stages.
At a certain point though the GPU will become the bottleneck, even if you offload some render.
At that point only low-level optimizations can help you, indeed.







