The 100 Prisoners Problem is the single coolest algorithmic riddle I know. It’s a great example of a simple algorithm accomplishing something totally unexpected.
But the algorithm for the 100 Prisoners Problem is only half the story. The second, lesser known half, is a breathtakingly simple proof that no other algorithm can do better.
The Riddle: A sadistic prison warden runs a prison with 100 prisoners (numbered 1 through 100). He decides to force them to play the following game. The warden lines up 100 boxes in a room, and places a random permutation of the numbers 1 through 100 in the boxes. Each prisoner enters the room and is allowed to open 50 boxes. If the prisoner’s number is in one of the opened boxes, then the prisoner wins the game. Here’s the catch: In order for the prisoners to be set free, all of the prisoners have to win the game.
Before each prisoner enters the room, the boxes are reset to their initial configuration (so there’s no way for one person to leave a hint for the next person). Moreover, once the game has begun, no communication is permitted between the prisoners. They can, however, agree on a strategy beforehand for how to choose which boxes to open. The question is, what’s the optimal strategy?
The Algorithm: If the prisoners each independently open a random subset of the boxes, then the probability of all the prisoners winning is , which is astronomically small when you consider that there are fewer than atoms in the universe.
Surprisingly, there is a strategy which allows the prisoners to do much better, raising the probability of freedom to roughly 30 percent.
The algorithm works like this: When the -th prisoner walks in the room, they begin by opening the -th box. They then look at the number which they just revealed, and use that to decide which box to open next, opening the -th box. They continue like this until they’ve opened 50 boxes, opening each successive box according to the number in the previous box.
The Intuition: If a box has some number in it, think of the box as having an arrow pointing from the box to the -th box. If you start at a given box, and follow the arrows, eventually you’ll come in a full circle and get back to the box where you started. In other words, each box is part of a cycle of arrows.
The -th prisoner starts by opening the -th box. They then continue to open each box in the cycle until they get to the box containing the number . Notice that the arrow from the box containing the number actually goes back to the -th box, the very first box the prisoner opened. In other words, the box containing is guaranteed to be the last box in the cycle. It follows that the -th prisoner will win the game if and only if the cycle containing the -th box is of length no more than fifty.
In order for any prisoner to lose, there has to be some cycle of length more than fifty. This fact alone is very powerful, because it ensures that if any prisoners lose, then actually at least half of the prisoners lose (namely, all the prisoners in the cycle of length more than fifty lose). The algorithm manages to make it so that whether or not various prisoners lose is now highly correlated.
To see how powerful this observation is, consider the following slightly easier variant of the game: Each prisoner opens boxes just as before, but now they are permitted to open 75 boxes instead of 50. Using our algorithm, whenever any prisoners lose, at least 75 prisoners lose. But, since each prisoner individually has a chance of losing, the average number of prisoners who will lose is 25. Given that the average is 25, the fraction of the time that more than 75 can lose is at most . The remaining of the time, all of the prisoners must win, resulting in them being set free.
Although this is a slick analysis, it doesn’t work when each prisoner is only allowed to open fifty boxes. Instead, we’ll do a more careful analysis, allowing us to calculate the exact probability of winning.
A Precise Analysis: Let’s start by counting the number of box configurations (i.e., permutations of numbers) which result in there being a cycle of length, say, 63. In order to construct such a permutation, we can begin by picking which boxes will be in the cycle. There are 100 ways to pick the first box in the cycle, 99 ways to then pick the second box in the cycle, and so on, ending with 38 ways to pick the final box in the cycle. But, of course, that’s not quite right, because what’s it mean for a box to be the “first” in the cycle? A cycle has no first element, meaning there are 63 ways to construct the same cycle. Dividing by 63, we get that the number of ways to construct a cycle of length 63 is
Once we’ve constructed the cycle, we still need to determine where the other numbers will go. Notice that 63 numbers already have their positions determined by the cycle which we constructed. We’re left with 37 remaining numbers and 37 positions to put them. Putting them in an arbitrary ordering, there are
ways to complete the construction of our permutation.
Multiplying the number of ways to pick the cycle by the number of ways to then complete the permutation, it follows that the number of permutations containing a cycle of length 63 is
But that’s just the total number of permutations divided by . In other words, if you pick a permutation at random, then the probability of there being a cycle of length is .
The same is true if we replace with any of the numbers . The probability of there being any cycle of length greater than 51, and the prisoners therefore losing the game, is exactly
At this point, we could just do out the summation by hand to get the answer. But there’s an easier way. In particular, we use the fact that the -th Harmonic number , which is defined to be
satisfies . Notice that the probability of the prisoners losing the game can be written as . Plugging in the approximation for , we get that the probability of some prisoner losing is roughly
Proving Optimality: A natural question is, can we do better? Surprisingly there’s a very simple and elegant proof that we cannot. Moreover, the proof gives a deep intuition for why the algorithm does so well.
Consider the following variant of the game, which I call The Baby 100 Prisoner’s Problem. In this version of the game, whenever a prisoner opens a box, the box is left open for any prisoners who come later. When a prisoner comes in, if their number has already been revealed, then the prisoner wins for free. Otherwise, the prisoner is allowed to open up to 50 of the remaining boxes in search of their number. Once the prisoner finds their number, they are not permitted to open any more boxes.
The baby problem is clearly much easier than the original. In fact, it’s not even an interesting algorithmic problem anymore, since every strategy will perform exactly the same. In particular, when opening a box that has never been opened before, then because the numbers are randomly permuted in the boxes, it doesn’t really matter what your strategy is for picking which box to open.
So the baby problem is easier than the original, and has the property that every algorithm performs optimally on it. Our proof strategy for showing that the cycle algorithm for the 100 Prisoner’s Problem is optimal will be to show that the cycle algorithm performs exactly the same for the standard problem as it does for the baby version of the problem. In other words, by using the cycle algorithm, we make it so that the full problem is just as easy as the baby one!
Consider the cycle algorithm for the standard version of the problem. Suppose the -th prisoner comes into the room, and all the previous prisoners have won the game so far. The -th prisoner wishes to find the box containing the number . Notice that if that box has ever been opened before (by a previous prisoner), then the entire cycle for that box must have also been opened by the past prisoner. Since all the past prisoners won the game, the cycle must be length no greater than fifty. It follows that the -th prisoner will also win the game! In other words, if the box containing has been opened at some point in the past, then we’re guaranteed to successfully find it now. On the other hand, if the box hasn’t been opened yet, then all the boxes which we will open will be in a cycle which has not yet been explored. That is, we will be opening only boxes which have not yet been opened.
But that’s exactly what happens in the baby problem! If the box we’re looking for has been opened before then we get it for free, and otherwise, we open new boxes to search for it! The cycle algorithm allows us to simulate the baby problem while actually playing the standard problem. Since the algorithm manages to do as well on the standard problem as it does on the baby problem, and since no algorithm for the standard problem can do better than any algorithm for the baby problem, it follows that the cycle algorithm must be optimal.
Conclusion: The 100 Prisoners Problem implicitly relies on a number of beautiful ideas from permutation combinatorics. At the heart of the algorithm’s analysis is the theory of permutation cycles. The proof of optimality can even be seen as implicitly constructing what’s known as Foata’s Transition Lemma, a classic bijection between two canonical representations of permutations.
Sometimes seeing classic combinatorial ideas applied to a different and less intuitive setting makes you realize just how elegant and subtle those ideas actually are. For me that’s exactly what the 100 Prisoners Problem does. It reminds me that sometimes the key to fully appreciating mathematical beauty is to look at it the wrong way.