Is there a Python threadpool like multiprocessing.Pool?

MiroslavRalevic · December 4, 2024, 6:30pm

Is there a Python threadpool similar to the multiprocessing.Pool for worker threads?

I like the simplicity of using multiprocessing.Pool to parallelize tasks, for example:

def long_running_func(p):
    c_func_no_gil(p)

p = multiprocessing.Pool(4)
xs = p.map(long_running_func, range(100))

However, I want to achieve the same parallelization without the overhead of creating new processes.

I understand the Global Interpreter Lock (GIL), but in my use case, the function being called is an IO-bound C function, and the Python wrapper releases the GIL before invoking the actual function.

Do I need to implement my own threading pool for this, or is there an existing Python threadpool solution that can handle this scenario?

ian-partridge · December 4, 2024, 6:32pm

Hey @MiroslavRalevic

May your code always be bug-free and your threads never deadlock!

Using concurrent.futures.ThreadPoolExecutor: The concurrent.futures module provides a high-level interface for asynchronously executing callables. You can use ThreadPoolExecutor to manage a pool of threads easily. This is especially handy for IO-bound functions as it avoids the overhead of creating new processes. Here’s an example:

from concurrent.futures import ThreadPoolExecutor

def long_running_func(p):
    c_func_no_gil(p)

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(long_running_func, range(100)))

It’s a simple yet powerful way to implement a Python thread pool!

jacqueline-bosco · December 6, 2024, 10:40am

Hey Everyone!

Building on what @ian-partridge mentioned, another approach is using queue.Queue with worker threads. This method lets you manually manage the thread pool, providing more control over task distribution among threads. Here’s an example:

import threading
import queue

def worker(task_queue):
    while True:
        p = task_queue.get()
        if p is None:
            break
        c_func_no_gil(p)

task_queue = queue.Queue()
threads = []

# Create worker threads
for _ in range(4):
    t = threading.Thread(target=worker, args=(task_queue,))
    t.start()
    threads.append(t)

# Add tasks to the queue
for p in range(100):
    task_queue.put(p)

# Stop workers
for _ in range(4):
    task_queue.put(None)
for t in threads:
    t.join()

This gives you the flexibility to tailor thread behavior exactly as needed!

panchal_archanaa · December 9, 2024, 10:41am

Adding to Toby’s detailed explanation, another efficient method is using multiprocessing.dummy.Pool. This version of Pool uses threads rather than processes, which reduces overhead while still allowing thread-based parallelism. Here’s how it works:

from multiprocessing.dummy import Pool as ThreadPool

def long_running_func(p):
    c_func_no_gil(p)

pool = ThreadPool(4)
results = pool.map(long_running_func, range(100))
pool.close()
pool.join()

It’s perfect if you want the simplicity of multiprocessing. Pool but with the lighter weight of threads!