Finding primes: basic methods

Trial division

How do you recognize primes or test for primality? One obvious but slow way is by trial division: to see that 97 is a prime, verify that it is indivisible by 2, 3, 4, up to 96. With some thought, you may realize that you only need to go up to sqrt(97).

Lemma: If a natural number n is composite, then it has a non-unit divisor d with d <= sqrt(n); in fact it has a prime divisor p with p <= sqrt(n).

Proof. If all prime factors of n are greater than sqrt(n), then since n is composite, it is a product of at least two prime factors p and q (not necessarily distinct). Therefore n = p*q*... > p*q > sqrt(n)*sqrt(n) = n, so n>n, contradiction. So n has a prime divisor <= sqrt(n). Consequently it has a non-unit divisor <= sqrt(n).

Another proof. Since n is composite, n = d*e for some natural numbers d and e which are not zero and not units. If both of them are greater than sqrt(n), then n = d*e > sqrt(n)*sqrt(n) = n, so n>n, contradiction. So d or e is <= sqrt(n); a prime divisor of one of them, thus a prime divisor of n, is therefore also <= sqrt(n).

For example, 10 has two prime factors, 2 and 5; one of them, 2, is below sqrt(10). If your task is to prime-factorize 10, of course you need to find out both of them, and go as far as to verify that both are primes. However, if you only want to see if 10 is composite or prime, you only need to search for non-unit divisors of 10 below sqrt(10), and 2 does the job.

The sieve of Eratosthenes

At this point, trial division is the only method you know for recognizing primes and composites. In a later section I will introduce better methods. But at this point you can do better as well. If you have a large piece of paper for writing, or a computer with enough memory, you can use the sieve of Eratosthenes. This is best if you want to find all primes from 1 to 30, for example.

Draw a table or list, enumerating all natural numbers from 1 to 30 consecutively. Actually you can start with 2, since 1 is not a prime or a composite, and we are only interested in primes and composites.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

We know that 2 is a prime, so leave it alone. But all larger multiples of 2 are composites, so cross them out. I cannot cross numbers out in HTML (actually I can, but the effect is not obvious), so I underline them. Because the list is consecutive and increasing, it is easy to spot all multiples of 2 once you see the first.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Now 3 is the first number that comes after the previous prime 2 and is left alone. It is therefore a prime. All of its larger multiples are composites, and you cross them out (and I underline them). If a number is already crossed out, it remains crossed out. Again, once you see the first multiple, you can easily see the rest: just keep adding 3.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Now 5 is the first number that comes after the previous prime 3 and is still left alone. It is therefore a prime. All of its larger multiples are composites, and you cross them out.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Now 7 is the first number that comes after the previous prime 5 and is left alone. So it is a prime, but you cross out its larger multiples... But wait! Notice that, by virtue of the above lemma, you can stop when you go over sqrt(30), i.e., you can in fact stop when you hit 7 here.

The numbers crossed out from the table are composites, and the numbers left intact are primes. This is the sieve of Eratosthenes: you sieve away multiples of 2, then multiples of 3, then multiples of 5, ... what remains are the primes as they are not multiples of anybody else.

To recapitulate:

Write down all numbers from 2 to M in consecutive, increasing order; the order helps you spot multiples.
Start with 2: it is a prime. Cross out all larger multiples of 2.
The first number that is still left alone and comes after the previous prime is also a prime. Cross out all larger multiples of it.
Repeat the above step, until you go beyond sqrt(M).
Now the composites are all crossed out, and the remains are the primes.

This recognizes all primes and composites from 2 to M.

If your web browser can run Java applets, you can try it out.

This method is particularly nice when you computerize it. Make an array from 2 to M of booleans. Using ``true'' to mean primes and ``false'' to mean crossed-out composites (of course you can do the reverse), initialize the array to ``true'' throughout. Codify the above steps, noting that to cross out all multiples of p other than p itself, a for loop from 2p to M step p will do. Thus, the sieve of Eratosthenes only employs cheap additions, and never requires expensive multiplications and divisions. It is very fast.

This latter point cannot be overemphasized. Occasionally some discussions of primes appear on the newsgroups, and often some idiot will post a program for printing all primes from 2 to M, saying ``I have optimized it greatly; it can finish in five hours when M is 10^10'' or some such. Obviously, the idiot uses expensive trial division faithfully for each and every number between 2 and 10^10! You are encouraged to estimate the total number of divisions required in this task; you may assume that the idiot optimizes the process by skipping all even numbers and all multiples of 3. I will just say there are terribly many divisions involved. Each division takes the computer quite some time, probably 20 to 100 times more than an addition takes.

Anyway, once the idiot posts the stupid program, those who know better (not always me) will post a program implementing the sieve of Eratosthenes, and remark that it only takes a few seconds. You may defend the idiot by noting that very few books and teachers discuss the subject, and so the idiot is forgivable. What is unforgivable is that in all instances the idiot sounds as if his or her program is the fastest possible, as if the world is too small for anything better. In contrast, the person who posts the sieve of Eratosthenes usually just shows the program, states the facts, and leaves it at that.

The shortcomings of the sieve of Eratosthenes are that it takes memory - one entry per number in the range - and it may not be what you want if you only need to test one single number for primality.