I’m working on an algorithm, and the function multiprocess is supposed to call another function, CreateMatrixMp(), in parallel as many times as there are CPUs available. Since I have never worked with multiprocessing before, I’m unsure whether to use Pool or Process.
The word “efficient” here refers to the function CreateMatrixMp() being called potentially thousands of times. I’ve read through the Python documentation on the multiprocessing module and have considered two possibilities:
Using the Pool class:
def MatrixHelper(self, args):
return self.CreateMatrix(*args)
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print(f'Number of CPUs to process WM: {cpus}')
poolCount = cpus * 2
args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]
pool = mp.Pool(processes=poolCount, maxtasksperchild=2)
tempData = pool.map(self.MatrixHelper, args)
pool.close()
pool.join()
Using the Process class:
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print(f'Number of CPUs to process WM: {cpus}')
processes = [mp.Process(target=self.CreateMatrixMp, args=(sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
for p in processes:
p.start()
for p in processes:
p.join()
From my research, it seems like Pool would be the better choice due to its lower overhead, and the fact that it automatically handles the number of CPUs on the machine. The only issue with using Pool is that I keep running into errors, and every time I fix one, another pops up. On the other hand, using Process seems simpler to implement, and for all I know, it might be the better choice.
Should I use Pool instead of Process, and is map() the right function to use if I want to maintain the order of results? I need to collect the results of each process in a list. How does Process handle returned data, and is it a good fit for this case?
Please advise when to use Python multiprocessing pool or process and the best approach for handling the data returned by each parallel task.