I spent a bit of time in my Introduction to Parallel Programming class a while ago working on a “cache-conscious” version of Radix sort. “Cache-conscious” algorithms are ones that take into account the size of the CPU’s cache.
The algorithm is an implementation of a paper presented in class – available here.
Radix sort works by partitioning the input into “buckets” based on individual digits of each significant position.
The “cache-conscious” implementation of the algorithm performs a traditional radix sort if all of the numbers in the current bucket fit into the CPU’s cache – otherwise, the numbers are further partitioned.
The code is written in C, and compiles using GCC, Clang, or GCC with OpenMP. See the project here.