Saturday, August 13, 2011

Learning Python with Project Euler: Problem 3

Turns out my original implementation is severely flawed, I was just lucky that it was good enough to work for the number Project Euler asked for...  Stay tuned for another implementation...

So the first two problems were a piece of cake.  Problem 3 requires a little more thinking.  It's necessary to implement an efficient algorithm to determine the largest primes factor of a very large number.  This one took me longer than I am proud to admit.  I think my first implementation was correct, but it's not efficient and will not return an answer in any acceptable time.  My second implementation found the answer in a fraction of a second.  I'm ashamed I didn't get this one sooner!

So what were the challenges to this problem?
  • implement an algorithm to determine if a number n is prime
  • implement an efficient way to determine the largest prime number of an integer
What eventually solved the problem was when I determined where to start checking numbers to see if they are factors and if they are prime.

Bullet point 2 was the kicker for me.  My first implementation was stupid because I decided I would find ALL prime factors of x and select the max of them.  This works fine for small numbers, but not so much for large ones.

import math
import sys

# determine if a number is prime
# precondition: n > 2 and n is odd
def is_prime(n):
  m = 2
  
  # if n is composite, then at least 1 factor is less than sqrt(n)
  max_m = math.sqrt(n) # calculate this value only once
  while m <= max_m:
    if n % m == 0:
      # if m divides n then n is not prime, return false
      return False
    m += 1

  return True

# allow an argument to specify the value of x, otherwise default it to what
# project euler is asking for
x = int(sys.argv[1]) if len(sys.argv) > 1 else 600851475143

# according to Direct Search Factorization [1], numbers greater than the floor
# of the sqrt of x need not be checked (see the article for the reason why)
i = int(math.floor(math.sqrt(x)))

# if i is even, minus 1 so we start at an odd number
if i & 1 == 0: i -= 1

# this loop will break once a prime factor is found
found = None
while i > 1:
  # the order these are checked is VERY important.  if you try the
  # is_prime method first you'll lose efficiency by MANY orders of
  # magnitude (I was too impatient to see if it would even finish running)
  if x % i == 0 and is_prime(i):
    found = i
    break
  i -= 2 # add 2 to avoid even numbers

if found == None:
  print str(x) + ' is prime and has no other prime factors'
else:
  print "largest prime factor: " + str(i)

# [1] http://mathworld.wolfram.com/DirectSearchFactorization.html

3 comments:

  1. Hi Josh! I was working on this problem and figured out a possible solution using a different approach. I used the prime factorization method and to calculate all prime factors and printed the largest one. Here's a link to my solution: http://pastebin.com/duGVaeYn

    Anyway, I also tried your solution and found that while it works for the default trial value of '600851475143' it doesn't seem to work for certain inputs since you've limited the looping to the square root of the input.

    As an example, for the input 21, √21 = 4.582 so your code runs till 4 and reports the solution as 3. However, since 21 = 3 x 7, the answer should be 7 in this case. Similarly, in case of 20, √20 = 4.472 and your code runs till 4 again and reports that '20 is prime and has no other prime factors' since all the factors it found till that point were even numbers.

    ReplyDelete
  2. Wow, I was lucky to get the correct answer, given how flawed my solution is (doesn't even work with 21...). It appears I'm misunderstanding what wolfram.com says about Direct Search Factorization. Thanks for noticing this.

    I need to read your solution a few times more to fully understand what's going on, as it's not immediately clear to me how it works.

    ReplyDelete
  3. My solution uses the prime factorization method.

    For instance, with 20 as input, the variable 'pf' is initialized to 2. Since it divides 20, I set it to be equal to largest and divide 20 by 2 to get 10. Now, again, pf is initialized to 2 and since it divides 10, I proceed similarly and get 5. Since 2 doesn't divide 5, the loop increments it to 3 (all subsequent increments are by 2 to avoid even numbers). Since 3 doesn't divide 5 either, the code increments it to get 5 which is our solution.

    So basically my code basically factorizes the given number since 20 = 2 x 2 x 5.

    As you can see, the continuous increment and division by the factors ensures that the next factor we get is prime by default. The last factor, therefore, is always the largest.

    I guess my code is a bit unclear because of a lack of comments and the fact that my use of the variable 'largest' is a bit awkward.

    ReplyDelete