Sorting Phone Numbers

You are given a file of up to 10 million 7-digit phone numbers (they were 1-800 numbers, but the 1-800 part has been removed for you). Each phone number cannot appear twice or more. Sort them, and output the sorted list to another file. You can use only 1 megabyte of memory, though you may also use extra files and disk space as much as you wish. Running time: as fast as possible, preferably on the order of seconds.

This problem is taken from Dr. Dobb's Journal, December 1999, Algorithm Alley column, “Programming Pearls: Cracking the Oyster”, by Jon Bentley. Bentley in turn took it from his book Programming Pearls, second edition (link).

Solution

In memory, use a bit vector to remember which phone numbers are seen in the input file and which aren't; then outputting the sorted list simply means dumping seen numbers in order.

But since we only have about 8.4 million bits, we make do with two passes. In the first pass, process numbers starting with 0-4, and output to a file; in the second pass, process numbers starting with 5-9, and append output to the first-pass output file.

Alternatively, if we assume the convention that phone numbers don't start with 0 or 1, there are only up to 8 million numbers, not 10 million, and so the whole thing fits in memory and we just need one pass.