By using this site, you agree to our Privacy Policy and our Terms of Use. Close
smallflyingtaco said:
jlauro said:
With enough cores, you could design the cpu such that instead of at the normal thread level, you could have functions work inside of cores. As one function calls another, it can literally be processing the values as they are coming in. With a single (or only a few cores) you are sending all the values on the stack, switching to the function, having the function pull them back to operate on, and pausing the calling function, and then returning once the processing is done. With tons of cores, the function can begin processing the data as it's coming while the calling function can be working on calculating the values to process. Both cores running concurrently. Some of that could be done by the compilers even for problems that don't naturally lend themselves to massive amounts of cores directly. As functions are hundreds of levels deep, think of the speed ups that are possible.

 

 The speedup your talking about assumes that your not getting any cache misses and that your functions can be called without reliance on data from another function.  Whenever those things happen your just going to leave one of your cores idling, with that happening hundreds or thousands of times your really just going to have hundreds of billions of wasted cycles.  You also can only add so many cores before the speed of light is going to limit how well they can communicate, even with a 3D chip layout.  You can in theory reduce the fab size to increase this but then your going to eventually run into Heisenberg problems.  This means your going to hit a maximum number of cores per chip, at which point adding more cores slows the whole thing down.

 

 

When it relies on data from another function, that is a plus as those functions can also be started in other cores.  As you see a huge number of cores ties in, as those other functions will run in other cores.  Local memory (at least 4k, more the better) for each core is essential, as is a cross bar switch between all of the cores.  As to the issue of worrying about all of the idle cores...  actually make the cross bar go between the threads, and have a separate crossbar interconnecting threads and cores. When the available on board memory for threads are all used, you can swap them out to external ram.

Of course with hundreds of threads on the chip, the cross bar switches will take as much space as the cores.