How can I optimize the Sieve of Eratosthenes in Python for larger limits?

Sieve of Eratosthenes - Finding Primes in Python

I’m working on a math application and decided to use the Sieve of Eratosthenes Python approach to find prime numbers. However, my initial implementation is running very slowly, especially when trying to find all primes less than 2 million—it takes over 20 minutes, and I had to stop it.

Here’s my original implementation:

def primes_sieve(limit):
    limitn = limit+1
    primes = range(2, limitn)

    for i in primes:
        factors = range(i, limitn, i)
        for f in factors[1:]:
            if f in primes:
                primes.remove(f)
    return primes

print(primes_sieve(2000))

Update:

I profiled the code and realized a significant amount of time is spent on removing elements from the list, which involves traversing and adjusting the list each time. So, I switched to using a dictionary instead of a list to speed up the process. Here’s my updated implementation:

def primes_sieve1(limit):
    limitn = limit+1
    primes = dict()
    for i in range(2, limitn): primes[i] = True

    for i in primes:
        factors = range(i, limitn, i)
        for f in factors[1:]:
            primes[f] = False
    return [i for i in primes if primes[i]==True]

print(primes_sieve1(2000000))

How can I further optimize this implementation to find primes faster for larger limits?

So, I’ve worked with optimizing the sieve of Eratosthenes Python implementation for a while. One of the easiest ways to improve it for larger limits is by switching from a dictionary to a boolean array. It’s a straightforward change, but it really reduces memory usage and boosts performance. Here’s a basic example that works well for large limits:

def primes_sieve_optimized(limit):
    sieve = [True] * (limit + 1)
    sieve[0] = sieve[1] = False  # 0 and 1 are not prime numbers

    for i in range(2, int(limit**0.5) + 1):
        if sieve[i]:
            for j in range(i * i, limit + 1, i):
                sieve[j] = False

    return [i for i, is_prime in enumerate(sieve) if is_prime]

print(primes_sieve_optimized(2000000))

This version is much faster, especially since it avoids unnecessary list operations and sticks to a simple boolean array, which helps a lot in terms of space and time efficiency.

I agree with @charity-majors’s approach, and I’d add that one way to make it even faster is by using a wheel factorization technique. With this, you can skip over numbers that are obviously divisible by smaller primes, like 2, 3, and 5. It’s a neat trick to make the sieve of Eratosthenes Python even more optimized."*

def primes_sieve_with_wheel(limit):
    primes = [2, 3, 5]
    sieve = [True] * (limit + 1)
    sieve[0] = sieve[1] = False
    
    for i in range(2, int(limit**0.5) + 1):
        if sieve[i]:
            for j in range(i * i, limit + 1, i):
                sieve[j] = False

    return [i for i in range(2, limit + 1) if sieve[i]]

print(primes_sieve_with_wheel(2000000))

By skipping over multiples of small primes, we effectively reduce the number of checks we need to perform. It really speeds up the process, especially for large limits. Give it a try on your sieve of Eratosthenes Python code and see the improvement!

These optimizations are spot on! Now, if you’re working with extremely large limits, I recommend using NumPy. I’ve used it quite a lot, and trust me, it can provide significant speedups due to its optimized array manipulation. It’s particularly useful for large-scale operations like the sieve of Eratosthenes Python.

import numpy as np

def primes_sieve_numpy(limit):
    sieve = np.ones(limit + 1, dtype=bool)
    sieve[0] = sieve[1] = False  # 0 and 1 are not prime numbers

    for i in range(2, int(limit**0.5) + 1):
        if sieve[i]:
            sieve[i*i:limit+1:i] = False  # Mark multiples of i as non-prime

    return np.nonzero(sieve)[0]

print(primes_sieve_numpy(2000000))

NumPy’s vectorized operations are lightning fast, so it speeds up marking non-primes significantly. If you’re handling huge numbers, I really suggest incorporating it for optimal performance. It’s a game-changer in the sieve of Eratosthenes Python world!