By using this site, you agree to our Privacy Policy and our Terms of Use. Close

@MisterBlonde: It partly my own creation, but some of it is based on reading a bunch of stuff. I can't take credit for the ideas of implementing a hardware scheduler, but unfortunately I don't recall where I read the paper on it. I am trying to play with some ideas with a fpga kit in spare time, but I am not very good with any HDLs yet (teaching myself), and can not fit many cores into a little $150 FPGA kiit... Would probably cost a few thousand to get a large enough fpga to be able to create enough soft cores as a single cpu to do any real experiments...

@NJ5: Inlining them would help, but you would still need multiple cores for them to run in parallel with the called function. However, you could probably get by with less cores then having the hardware do it... That said, inlining would not give you the pipelining speedups of stream processing data through multiple functions. A lot of times you are just moving data and making minor alterations to it as it goes from one function to another, often feeding the output of one function to the next. If you let the next function work on the outgoing data of the previous function as it's being generated, it can process through both functions in just a little more time then it would take for the first function to process it.
Consider:
char *c;
makeupper( c );
removewhitespace( c );
tokenize( c );

and let's say C is 2k stream of data.
Normally, each of those functions have to run through all 2k of data. If nothing else, it's going to take time to read through that.

Now, obviously we have language issues in c, but...
pipeline *c
tokenize( removewhitespace( makeupper( c ) ) )

So, it's going to take 12k cycles at least as a minimum, being nice as saying it's only 1 cycle to read and 1 cycle to process each character in the standard example.

However, with it pipelined, it will not take much more then 4k cycles to process through all the functions, and it the primary core could actually start doing other stuff after only a relatively few cycles. Now, it does use 4 cores, but no amount of inlining will give you that speed up.  Being generous, you could rewrite it as one function token-n-removewhite-n-makeupper, and give it 1 for read, and a cycle for each operation, or 8k cycles to process the full 2k.  Less modular code, and still take twice as long, but is a compromise.