By using this site, you agree to our Privacy Policy and our Terms of Use. Close
jlauro said:


@NJ5: Inlining them would help, but you would still need multiple cores for them to run in parallel with the called function. However, you could probably get by with less cores then having the hardware do it... That said, inlining would not give you the pipelining speedups of stream processing data through multiple functions. A lot of times you are just moving data and making minor alterations to it as it goes from one function to another, often feeding the output of one function to the next. If you let the next function work on the outgoing data of the previous function as it's being generated, it can process through both functions in just a little more time then it would take for the first function to process it.
Consider:
char *c;
makeupper( c );
removewhitespace( c );
tokenize( c );

and let's say C is 2k stream of data.
Normally, each of those functions have to run through all 2k of data. If nothing else, it's going to take time to read through that.

Now, obviously we have language issues in c, but...
pipeline *c
tokenize( removewhitespace( makeupper( c ) ) )

So, it's going to take 12k cycles at least as a minimum, being nice as saying it's only 1 cycle to read and 1 cycle to process each character in the standard example.

However, with it pipelined, it will not take much more then 4k cycles to process through all the functions, and it the primary core could actually start doing other stuff after only a relatively few cycles. Now, it does use 4 cores, but no amount of inlining will give you that speed up. Being generous, you could rewrite it as one function token-n-removewhite-n-makeupper, and give it 1 for read, and a cycle for each operation, or 8k cycles to process the full 2k. Less modular code, and still take twice as long, but is a compromise.

I was assuming the parameters were just numbers. If they're bigger data structures which can be processed in a pipelined fashion we can run the functions in parallel, but I assume that's not what CrashMan had in mind with his example.

"you would still need multiple cores for them to run in parallel with the called function"

Not necessarily, not if the CPU allows for out-of-order execution in parallel. In the end it depends of what the function is doing. Having multiple cores may help a lot if there are lots of parallelizable statements in the two functions, but then the current architectures can perform fine by using normal multi-threading and either a refactoring of the function or the use of high-level optimizations in the compiler.

I am not saying your idea for an architecture is worthless, it just seems to me that the improvements you talk about can already be realized today with smart programming, compilers and/or existing features in CPUs. In your example, software pipelining can be used and some CPUs would parallelize the operations to potentially 4 operations per cycle. Perhaps compilers don't do this today, but it may be simpler to implement it in the compiler than to make a whole new architecture for it.

 



My Mario Kart Wii friend code: 2707-1866-0957