I’m working on an algorithm, and the function multiprocess
is supposed to call another function, CreateMatrixMp()
, in parallel as many times as there are CPUs available. Since I have never worked with multiprocessing before, I’m unsure whether to use Pool
or Process
.
The word “efficient” here refers to the function CreateMatrixMp()
being called potentially thousands of times. I’ve read through the Python documentation on the multiprocessing
module and have considered two possibilities:
Using the Pool
class:
def MatrixHelper(self, args):
return self.CreateMatrix(*args)
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print(f'Number of CPUs to process WM: {cpus}')
poolCount = cpus * 2
args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]
pool = mp.Pool(processes=poolCount, maxtasksperchild=2)
tempData = pool.map(self.MatrixHelper, args)
pool.close()
pool.join()
Using the Process
class:
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print(f'Number of CPUs to process WM: {cpus}')
processes = [mp.Process(target=self.CreateMatrixMp, args=(sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
for p in processes:
p.start()
for p in processes:
p.join()
From my research, it seems like Pool
would be the better choice due to its lower overhead, and the fact that it automatically handles the number of CPUs on the machine. The only issue with using Pool
is that I keep running into errors, and every time I fix one, another pops up. On the other hand, using Process
seems simpler to implement, and for all I know, it might be the better choice.
Should I use Pool
instead of Process
, and is map()
the right function to use if I want to maintain the order of results? I need to collect the results of each process in a list. How does Process
handle returned data, and is it a good fit for this case?
Please advise when to use Python multiprocessing pool or process and the best approach for handling the data returned by each parallel task.