Replace loop with numpy — speed up the cumulative sum calculation

Question

There is an array [-1,0,0,0,0,1,1,1,1,-1,-1,1,-1] .

With the help of such a cycle, each next element becomes the sum of the previous ones:

 for i in range(BESTRESULT.shape[0] - 1): BESTRESULT[i + 1] = BESTRESULT[i + 1] + BESTRESULT[i]

The total code execution time due to this cycle is exactly doubled. There is an urgent need to replace this code with something faster. Preferably using NumPy .

What result should be obtained from this array, just to understand the logic?

Accepted Answer · 2016-10-26T15:50:51

If you have a numpy array a :

 a = numpy.array([-1,0,0,0,0,1,1,1,1,-1,-1,1,-1])

To find the cumulative amount, it is enough to numpy.cumsum() :

 >>> a.cumsum() array([-1, -1, -1, -1, -1, 0, 1, 2, 3, 2, 1, 2, 1])

A simple comparison shows the difference ~ 70 times compared with the code in question on my machine ( ipython ):

 In [0]: import numpy In [1]: a = numpy.repeat(1, 1000000) In [2]: %timeit a.copy() 1000 loops, best of 3: 937 µs per loop In [3]: %timeit a.cumsum() 100 loops, best of 3: 2.39 ms per loop In [4]: %timeit a.copy().cumsum() 100 loops, best of 3: 6.31 ms per loop In [5]: def cumsum_mihail(a): # код из вопроса for i in range(a.shape[0] - 1): a[i + 1] = a[i + 1] + a[i] In [6]: %timeit cumsum_mihail(a.copy()) 1 loop, best of 3: 477 ms per loop

jfs

44.5k eight 53 199

I was just doing a comparison (timeit) for these two solutions ... But since you have already answered, can you add a comparison to your answer for an array of ~ 1M elements? I have a difference in speed (compared with the decision by @Alexander ) - 42.5 times turned out ... - MaxU
@MaxU it is clear that cumsum () is faster than an explicit loop for numpy arrays (since it is very slow to process one element from the numpy array). But in order to show the figures, more information is needed, otherwise there are a lot of options to sort through: 1 - calculate locally ( out=a ) or copy 2 - type float, int 3 - where the source data comes from and what size. (The case with 10 elements may differ from 1000, 1000_000, 1000_000_000 4. Which hardware? Which CPU? Which version of numpy? Some things may not matter. Without measuring it’s hard to say. It’s easy to give an option that is probably faster than the code in question but - jfs
.. qualitative comparison is much harder. Easily meaningless misleading numbers to get. If you do not know the exact purpose, then it is better not to give figures (if the boundedness of the results is not clearly indicated or is not clear from the context). You can try comparing if you wish. - jfs
I believe that comparing the performance on the same hardware for the same data (possibly for several arrays of different lengths) cannot prevent ... - MaxU
@MaxU if we present the results as: "here is the code that I launched on my machine for such and such data and obtained such and such results" and do not promise that they have greater applicability, it can be purely for illustration (to make it clear that in at least one case, one code, faster than the other). The problem is that people will draw wider conclusions that are insufficiently substantiated by data. - jfs

|

Answer 2 · 2016-10-26T14:27:33

If I understand correctly, and you need the accumulated amount of the sequence, then it is better to use itertools.accumulate. In theory, it should work very quickly.

 from itertools import accumulate BESTRESULT = [-1,0,0,0,0,1,1,1,1,-1,-1,1,-1] BESTRESULT = accumulate(BESTRESULT) print(list(BESTRESULT)) # [-1, -1, -1, -1, -1, 0, 1, 2, 3, 2, 1, 2, 1]

UPD: Now made a comparison. My version of accumulate fulfills a list of a million items on my machine in about 0.04 seconds. Your cycle option does the same in 0.29 seconds. That is, the variant proposed by me gives a speed gain of more than seven times.

MaxU MaxU 52.4k 6 18 51 · Answer 3 · 2016-10-26T16:32:49

Here is a comparison of the speed (on my hardware) for the NumPy array, consisting of 1 million randomly selected elements. All elements of the array belong to the trace. to the set: [-1, 0, 1] :

 import numpy as np from itertools import accumulate a = np.random.choice([-1,0,1], 10**6)

Measurements for an array of 1 million items:

 In [5]: x = a.copy() In [6]: %%timeit ...: for i in range(x.shape[0] - 1): ...: x[i + 1] = x[i + 1] + x[i] ...: 1 loop, best of 3: 436 ms per loop In [7]: x = a.copy() In [8]: %timeit x.cumsum() ...: 100 loops, best of 3: 2.79 ms per loop In [9]: x = a.copy() In [10]: %timeit list(accumulate(x)) ...: 10 loops, best of 3: 120 ms per loop

Measurements for an array of 1st thousand items:

 In [11]: a = np.random.choice([-1,0,1], 10**3) In [12]: x = a.copy() In [13]: %%timeit ...: for i in range(x.shape[0] - 1): ...: x[i + 1] = x[i + 1] + x[i] ...: 1000 loops, best of 3: 1.01 ms per loop In [14]: x = a.copy() In [15]: %timeit x.cumsum() ...: 100000 loops, best of 3: 3.62 µs per loop In [16]: x = a.copy() ...: In [17]: %timeit list(accumulate(x)) ...: 10000 loops, best of 3: 99.4 µs per loop

Replace loop with numpy — speed up the cumulative sum calculation

3 answers 3

More articles: