# The Many Quirks of Qsort

In 1973, the author of the C programming language Dennis Ritchie was such a fan of the quicksort algorithm that he decided to name the language’s sort function after it.

In this post, I’m going to dive into some of the most interesting (and bizarre) aspects of the qsort function. I’m going to be focusing on the GNU C library version of qsort—you can find the code here and here.

The m in qsort. What if I told you that the world’s most famous implementation of quicksort actually uses… mergesort.

It’s true. The GNU implementation of qsort directly calls mergesort as the default sorting algorithm. It’s hard to tell exactly when this change occurred—based on the version history of glibc, it looks like the modification happened sometime between 1992 and 1995.

But why? I haven’t been able to find any documentation describing the reason for the transition from quicksort to mergesort, but I do have a conjecture. Although quicksort is typically faster than mergesort in practice, mergesort has one advantage: it performs fewer total comparisons. In the C library, this ends up being important, because the sort function performs comparisons by dereferencing a pointer to a comparison function. This makes comparisons much more expensive than they should be, giving mergesort an advantage.

I compared the library implementations of quicksort and mergesort on my machine, running both on a random array of ten million 64-bit integers. Mergesort was faster, taking 1.4 seconds instead of quicksort’s 1.7 seconds.

Then I modified the implementations to perform comparisons inline (and to also move objects around in 64-bit chunks instead of 8-bit chunks). This completely flipped things. Now quicksort became faster, taking 0.9 seconds instead of mergesort’s 1.2 seconds. So the pointer dereferences really do seem to be key.

If you really want quicksort, you can have it. Here’s a neat trick. If malloc fails, then qsort can’t allocate the memory needed for mergesort. In this case, qsort actually does use quicksort.

I wrote a function called malloc_all that allocates all of the memory in my system (this takes about half a second). If I call qsort after running malloc_all, then it runs the quicksort algorithm.

Wait, where’s the randomness? Here’s where things start to get interesting. If I call malloc_all and then I run qsort on the million-element array

$\displaystyle 0, 0, 0, \ldots, 0, 1, 2, 3, \ldots, 250000, 0, 1, 0, 2, 0, 3, \ldots, 0, 250000,$

then qsort takes a whopping 394 seconds to run. Wow!

It turns out that qsort‘s quicksort doesn’t use randomness to pick its pivot. As a consequence there are lots of natural inputs that cause the algorithm to take quadratic time!

Unlike C++, which requires that its sort function runs in ${O(n \log n)}$ time, the C standard places no such requirement on qsort. So it’s not that qsort is broken. It’s just quirky.

But why not use randomness? Most modern library implementations of quicksort are based on Bentley and McIlroy’s 1993 paper, Engineering a Sort Function. In it, Bentley and McIlroy advocate against using randomness, saying that “a library sort has no business side-effecting the random number generator”. As far as I can tell, this single sentence decided the fate of most modern quicksort library implementations. I think this is a bit of a shame, since it only takes a few lines to implement a library-specific random number generator, and then we would have a sort function without any worst-case inputs… But I’m also a theoretician, so I’m very biased.

The true namesake of qsort. I need to confess something: qsort isn’t actually named after quicksort. It’s named after quickersort, the variant of quicksort introduced by R.S. Scowen in 1965, three years after the introduction of quicksort.

There is one very neat way in which the two algorithms differ: quickersort implements its own recursion stack manually in a way that guarantees a stack depth of ${O(\log n)}$. The way that quickersort does this is simple but clever. Suppose quickersort is executing a subproblem ${A}$, which has two recursive child subproblems ${B}$ and ${C}$, where ${B}$ is the smaller of the two child subproblems. After performing the partitioning step in ${A}$, quickersort then executes ${B}$ recursively. After performing ${B}$, however, quickersort then overwrites ${A}$‘s stack frame with ${C}$‘s stack frame, and performs ${C}$ at the same stack depth as ${A}$ was performed. The result is that, the only time that the stack depth ever increases is when the problem size decreases by at least a factor of two. Hence the bound of ${O(\log n)}$ on stack size.

A beautiful piece of code. At the end of the day, GNU libc qsort is one of my favorite pieces of code. I like the strange quirks. But there’s more to it than that. It’s a simple and beautifully written piece of code full of beautiful nuances and subtleties. If you have a few minutes, you might want to take a look.

You can also rerun the experiments in this blog post using this code.