Multithreading in computational problems

Question

There is a need to perform calculations on a three-dimensional array. Among which there are FFT and median search (the most expensive).

An attempt to introduce a multi-threaded processing was negative for acceleration. Given that the program does not even produce a result record (this, presumably, could slow down the streams). Apparently, a lot of CPU time is spent on the original threading.

Are there any critical errors in the organization of threads in the program or for such a task it is impossible to get a performance boost at all from the threads?

from queue import Queue from threading import Thread import numpy as np _size = 256 # создание массива комплексных чисел размерностью 3х3х3 arr = np.random.rand(_size, _size, _size) \ + np.random.rand(_size, _size, _size) * 1j def single(arr): # функция, которая выполняется в одном потоке for fd in range(arr.shape[0]): for sd in range(arr.shape[1]): spec = np.fft.fftshift(abs(np.fft.fft(arr[fd, sd, :]))) amax = spec.argmax() val = 20*np.log10(spec[amax]) - 20*np.log10(np.median(spec)) # число потоков nwork = 4 def multith(arr): # функция, выполняющаяся в nwork потоках def selffun(arr): spec = np.fft.fftshift(abs(np.fft.fft(arr))) amax = spec.argmax() val = 20*np.log10(spec[amax]) - 20*np.log10(np.median(spec)) def worker(): while True: item = _queue.get() selffun(item) _queue.task_done() def source(arr): # генератор заданий for fd in range(arr.shape[0]): for sd in range(arr.shape[1]): yield arr[fd, sd, :] _queue = Queue() for i in range(nwork): th = Thread(target=worker) th.setDaemon(True) th.start() for item in source(arr): _queue.put(item) _queue.join()

Result:

 %timeit single(arr) 1 loop, best of 3: 4.61 s per loop nwork = 4 %timeit multith(arr) 1 loop, best of 3: 7.45 s per loop nwork = 2 %timeit multith(arr) 1 loop, best of 3: 6.31 s per loop

If you use numpy which uses MKL from Intel, then it should itself parallelize the calculations, in my opinion

Avernial Avernial 3,096 1 golden mark 7 silver marks 17 bronze marks · Accepted Answer · 2016-07-20T06:38:25

Using threads will not speed up the code for computing, as there is a GIL in python. To speed up the code, use Process from the multiprocessing module or ProcessPoolExecutor from the concurrent module.

Avernial

3,096 1 golden mark 7 silver marks 17 bronze marks

Thank. Do you need threads in python only for tasks where there is input / output or web requests? - mkkik
one
No, you're using a queue. And for the case with ProcessPoolExecutor, it will return the list to you. If you need to write a large amount of data, then you make a process that reads the data from the queue and writes to the file. - Avernial
2
@mkkik numpy can release GIL, so some calculations (such as A=B+C ) can be sped up by using multiple threads. Cython is also simple and convenient for this (nogil construct). You can also avoid unnecessary copying using a common array ( multiprocessing.Array ), even when several processes are used. Numpy can also omp (not Python) threads use and install CPU affinity !! If there is a need to speed up the work of the code, then you need to directly ask about it (paralleling is not required to speed up the code in the general case, regardless of the language used). - jfs
one
@jfs, yes, of course, you need code acceleration. All of the above somewhere can be read in a systematic way? - mkkik
one
@mkkik: ... for a start, look at a couple of links at the end of my answer - jfs

|

mkkik mkkik 3,223 1 golden mark 12 silver marks 32 bronze marks · Answer 2 · 2016-07-20T13:34:16

This is not an answer to your question, I just want to write about the results that were achieved thanks to the information provided in the comments to the question and the answer. Perhaps it will be useful for someone.

numpy c MKL
The current versions of the Anaconda distribution include the numpy library compiled with MKL support and the mkl-service package. But the execution time of my function (median, FFT) without paralleling remained the same (as well as without MKL ). I saw articles that describe the python + numpy + scipy build process specifically for working with MKL , but in this case you need an Intel compiler, which is paid.
concurrent and multiprocessing
When replacing threads with ProcessPoolExecutor from concurrent without changing the rest of the program code with max_workers = 2 , the execution time approached the single-threaded result - 4.57 s. With an increase in the number of processes, the execution time increased.
With a similar replacement to the Pool of multiprocessing :
multiprocessing.Pool (2): 2.78 s
multiprocessing.Pool (4): 1.74 s
multiprocessing.Pool (8): 1.3 s

multiprocessing.Process and shared multiprocessing.RawArray

Based on this example .

 import ctypes, itertools import multiprocessing as mp import numpy as np _size = 256 arr = np.random.rand(_size, _size, _size) \ + np.random.rand(_size, _size, _size) * 1j def selffun(arr, sl, arrD): d = np.reshape(np.frombuffer(arrD), (_size, _size)) spec = np.fft.fftshift(abs(np.fft.fft(arr[sl[0], sl[1], :]))) amax = spec.argmax() d[sl[0], sl[1]] = 20*np.log10(spec[amax]) - 20*np.log10(np.median(spec)) def worker(arr, q, arrD): while True: item = q.get() if item is None: break selffun(arr, item, arrD) q.task_done() q.task_done() def main(arr): a, b = arr.shape[:-1] arrD = mp.RawArray(ctypes.c_double, a*b) nCPU = mp.cpu_count() queue = mp.JoinableQueue() for item in itertools.product(range(a), range(b)): queue.put(item) for i in range(nCPU): queue.put(None) workers = [] for i in range(nCPU): _worker = mp.Process(target=worker, args=(arr, queue, arrD)) workers.append(worker) _worker.start() queue.join() return np.reshape(np.frombuffer(arrD), (a, b)) if __name__ == '__main__': main(arr)

Result: 1.83 s ( nCPU = 8 , Intel® Core ™ i7-3770 CPU @ 3.40GHz × 8)

The best time result was obtained with multiprocessing.Pool , but has not yet figured out how to use Pool.map with a shared array.

Surely it will come in handy for “nampearers” working with large data arrays ...

Multithreading in computational problems

2 answers 2

More articles: