There is a need to perform calculations on a three-dimensional array. Among which there are FFT and median search (the most expensive).
An attempt to introduce a multi-threaded processing was negative for acceleration. Given that the program does not even produce a result record (this, presumably, could slow down the streams). Apparently, a lot of CPU time is spent on the original threading.
Are there any critical errors in the organization of threads in the program or for such a task it is impossible to get a performance boost at all from the threads?
from queue import Queue from threading import Thread import numpy as np _size = 256 # создание массива комплексных чисел размерностью 3х3х3 arr = np.random.rand(_size, _size, _size) \ + np.random.rand(_size, _size, _size) * 1j def single(arr): # функция, которая выполняется в одном потоке for fd in range(arr.shape[0]): for sd in range(arr.shape[1]): spec = np.fft.fftshift(abs(np.fft.fft(arr[fd, sd, :]))) amax = spec.argmax() val = 20*np.log10(spec[amax]) - 20*np.log10(np.median(spec)) # число потоков nwork = 4 def multith(arr): # функция, выполняющаяся в nwork потоках def selffun(arr): spec = np.fft.fftshift(abs(np.fft.fft(arr))) amax = spec.argmax() val = 20*np.log10(spec[amax]) - 20*np.log10(np.median(spec)) def worker(): while True: item = _queue.get() selffun(item) _queue.task_done() def source(arr): # генератор заданий for fd in range(arr.shape[0]): for sd in range(arr.shape[1]): yield arr[fd, sd, :] _queue = Queue() for i in range(nwork): th = Thread(target=worker) th.setDaemon(True) th.start() for item in source(arr): _queue.put(item) _queue.join() Result:
%timeit single(arr) 1 loop, best of 3: 4.61 s per loop nwork = 4 %timeit multith(arr) 1 loop, best of 3: 7.45 s per loop nwork = 2 %timeit multith(arr) 1 loop, best of 3: 6.31 s per loop
print(np.show_config())? If you usenumpywhich uses MKL from Intel, then it should itself parallelize the calculations, in my opinion - MaxU