Project Euler Problem 3: Largest prime factor

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143 ?

Version 1 - Inefficient Trial Division

First a function to generate all factors. We need only test candidate factors up to $n$.

In [1]:
from six.moves import filter, map, range, reduce
In [2]:
factors = lambda n: list(filter(lambda x: not n % x, range(1, n+1)))
In [3]:
factors(24)
Out[3]:
[1, 2, 3, 4, 6, 8, 12, 24]

This isn't really that helpful, since we are only interested in prime factors. While we can implement a primality predicate and filter out composites numbers, it isn't particularly efficient. Let's rethink our approach.

Instead of iterating over all integers up to $n$, we can just iterate over the prime numbers. But first we'll need a prime number generator. The simple Sieve of Eratosthenes (250s BCE) seems like a good place to start.

Version 2 - Trial Division with Prime Number Sieve

In [4]:
def _prime_sieve_eratosthenes(n):
    composites = set()
    for i in range(2, n):
        if i not in composites: 
            yield i
            composites.update(set(range(i*i, n, i)))
In [5]:
list(_prime_sieve_eratosthenes(47))
Out[5]:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43]
In [6]:
list(_prime_sieve_eratosthenes(15))
Out[6]:
[2, 3, 5, 7, 11, 13]

Note that this function returns a list of all prime numbers strictly below $n$, so even if $n$ is prime, it will not be included.

Of course, there are ways to optimize this implementation, but this will do for now.

In [7]:
prime_sieve = _prime_sieve_eratosthenes

Note that we need not inspect all primes up to $n$, but only those up to $\sqrt{n}$. To see why, let's assume $p \times q = n$ and that $p > \sqrt{n}$. It cannot be the case that $q > p$, since that would imply $n = p \times q > p^2 > n$. But if $q \leq p$ and $q$ divides $n$, then it would have been detected earlier anyways. Therefore, we need only consider $p \leq \sqrt{n} \Leftrightarrow p^2 \leq n$, where we have equality if $n$ is a perfect square.

In [8]:
from collections import Counter

def prime_factors(n):
    if n < 0: 
        raise ValueError('Requires positive integer') 
    elif n == 1:
        raise ValueError('1 is neither prime nor composite and has no prime factors')
    else:
        prime_factors_ = Counter()
        for p in prime_sieve(int(n**0.5)+1):
            while n % p == 0:
                prime_factors_[p] += 1
                n /= p
        if n > 1: # n has no further prime factors
            prime_factors_[n] = 1
        return prime_factors_

In accodance with the above, the last element of prime_sieve(int(n**0.5)+1) will be the largest prime $p$ such that $p < \lfloor \sqrt{n} \rfloor + 1 \Leftrightarrow p \leq \lfloor \sqrt{n} \rfloor \Leftrightarrow p \leq \sqrt{n}$ since $p$ must be an integer.

In [9]:
prime_factors(8)
Out[9]:
Counter({2: 3})
In [10]:
prime_factors(15)
Out[10]:
Counter({3: 1, 5.0: 1})
In [11]:
prime_factors(47)
Out[11]:
Counter({47: 1})
In [12]:
prime_factors(2)
Out[12]:
Counter({2: 1})
In [13]:
prime_factors(1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-521428771923> in <module>()
----> 1 prime_factors(1)

<ipython-input-8-2366c18fd863> in prime_factors(n)
      5         raise ValueError('Requires positive integer')
      6     elif n == 1:
----> 7         raise ValueError('1 is neither prime nor composite and has no prime factors')
      8     else:
      9         prime_factors_ = Counter()

ValueError: 1 is neither prime nor composite and has no prime factors
In [14]:
prime_factors(6)
Out[14]:
Counter({2: 1, 3.0: 1})
In [15]:
prime_factors(1883)
Out[15]:
Counter({7: 1, 269.0: 1})
In [16]:
prime_factors(13195)
Out[16]:
Counter({5: 1, 7: 1, 13: 1, 29: 1})
In [17]:
a = prime_factors(600851475143); a
Out[17]:
Counter({71: 1, 839: 1, 1471: 1, 6857: 1})
In [18]:
max(a.keys())
Out[18]:
6857

Note that we could have just used a list to keep track of the prime factors, or even a set if we don't care about the number of appearances of each prime factor. However, given that we do, the best option was to use a Counter from Python's collections library. Using it has benefits beyond cleaner code, as we shall see.

Recall that the least common multiple (LCM) and greatest common divisor (GCD) of integers can both be obtained through prime factorization of the integers. For the GCD, this is simply the "intersection" of the prime factors, while for LCM, this is simply the "union".

Thus, we can directly make use of a Counter's default mathematical operations it supports. Namely, the following:

In [19]:
c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
In [20]:
c & d # intersection:  min(c[x], d[x])
Out[20]:
Counter({'a': 1, 'b': 1})
In [21]:
c | d # union:  max(c[x], d[x])
Out[21]:
Counter({'a': 3, 'b': 2})

Therefore, we can directly use prime_factor to implement the lcm and gcd functions.

In [22]:
# First lets create a function to reconstruct
# the original integer from its prime factors
reconstruct = lambda c: reduce(lambda x, y: x*y, map(lambda x: x**c[x], c))
In [23]:
prime_factors(48)
Out[23]:
Counter({2: 4, 3: 1})
In [24]:
reconstruct(prime_factors(48))
Out[24]:
48
In [25]:
reconstruct(prime_factors(165)) == 165
Out[25]:
True

Looking good. Now let's implement lcm and gcd.

In [26]:
lcm = lambda *ns: reconstruct(reduce(lambda n, m: n | m, map(prime_factors, ns)))
In [27]:
gcd = lambda *ns: reconstruct(reduce(lambda n, m: n & m, map(prime_factors, ns)))
In [28]:
lcm(48, 180)
Out[28]:
720
In [29]:
gcd(48, 180)
Out[29]:
12

To walk through what this is doing:

In [30]:
c = prime_factors(48); c
Out[30]:
Counter({2: 4, 3: 1})
In [31]:
d = prime_factors(180); d
Out[31]:
Counter({2: 2, 3: 2, 5: 1})
In [32]:
c | d
Out[32]:
Counter({2: 4, 3: 2, 5: 1})
In [33]:
c & d
Out[33]:
Counter({2: 2, 3: 1})
In [34]:
reconstruct(c | d)
Out[34]:
720
In [35]:
reconstruct(c & d)
Out[35]:
12

All done! We must note, however, that since our integer factorization isn't particularly efficient, and even worse, that there is no known general efficient algorithm for integer factorization, this is not the best way to compute the gcd and lcm for large numbers. I only implement it like this here to demonstrate further applications of prime_factors by taking advantage of Python's Counter. Note that the lcm can be obtained by having the gcd and there are many efficient algorithms for computing the gcd, such as the Euclidean algorithm. We shall implement them later.

Comments

Comments powered by Disqus