Slow code execution - toss a coin a billion times

Question

Such, for example, a code of only a billion iterations runs at my hour (60 minutes) - what am I doing wrong?

import random reshka = 0 orel = 0 i = 0 while i < 1000000000: coin = random.randint(0, 1) if coin > 0: reshka += 1 else: orel += 1 i += 1 print('reshek', reshka, 'orlov', orel) input()

On i5-2500 on w8, it runs at a speed of about a million iterations / 1.4-1.5s.
If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer).

Accepted Answer · 2015-07-01T13:40:56

You can run on Pypy without changes , in which the JIT compiler is present (51 seconds):

 $ /usr/bin/time pypy . ('reshek', 499987397, 'orlov', 500012603) 50.67user 0.00system 0:50.71elapsed 99%CPU (0avgtext+0avgdata 37004maxresident)k 0inputs+0outputs (0major+3409minor)pagefaults 0swaps

Or rewrite the code to request the entire billion at once on a regular implementation (Python):

 #!/usr/bin/env python import random N = 10**9 n = random.getrandbits(N) popcount = bin(n).count("1") print("heads: %d tails: %d" % (popcount, (N - popcount)))

This option is noticeably faster (8 seconds):

 $ /usr/bin/time python heads-n-tails.py heads: 499999280 tails: 500000720 7.26user 0.13system 0:07.40elapsed 99%CPU (0avgtext+0avgdata 1115464maxresident)k 0inputs+0outputs (0major+3053minor)pagefaults 0swaps

Community spirit ♦ one · Answer 2 · 2015-07-01T07:15:27

Please pay attention to the @jfs answer, its solution is more effective than mine.

In order to find out what exactly slows down your program, you can use a profiler. It comes with Python, the documentation can be read here . Let's see what's wrong with your program:

 python -m cProfile -s time test.py

To speed up the check, I reduced the number of iterations to 1,000,000. The profiler will produce something like this:

  6002902 function calls (6002874 primitive calls) in 3.848 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1000000 1.162 0.000 2.728 0.000 random.py:165(randrange) 1000000 0.992 0.000 1.566 0.000 random.py:216(_randbelow) 1 0.694 0.694 3.848 3.848 test.py:1(<module>) 2002055 0.518 0.000 0.518 0.000 {method 'getrandbits' of '_random.Random' objects} 1000000 0.422 0.000 3.150 0.000 random.py:210(randint) 1000000 0.056 0.000 0.056 0.000 {method 'bit_length' of 'int' objects} 1 0.001 0.001 0.001 0.001 {built-in method load_dynamic} 16 0.001 0.000 0.001 0.000 {built-in method stat} 1 0.000 0.000 0.001 0.001 {built-in method print} 2 0.000 0.000 0.000 0.000 {built-in method loads} ...

As expected, the random number generation by the randint method consumes the most time. As can be seen from the profiler output, it takes a lot of time to check for the number in the interval. It also looks strange number of calls to getrandbits - 2 times more than necessary. The generation of a random integer from 0 to 1 can be accelerated by immediately using getrandbits :

 coin = random.getrandbits(1)

Thus, we get rid of unnecessary calculations and get a noticeable performance boost:

  1000847 function calls (1000819 primitive calls) in 0.837 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.440 0.440 0.837 0.837 test.py:1(<module>) 1000000 0.237 0.000 0.237 0.000 {method 'getrandbits' of '_random.Random' objects} 2 0.093 0.046 0.094 0.047 <frozen importlib._bootstrap>:1031(get_data) 1 0.062 0.062 0.062 0.062 {built-in method load_dynamic} 2 0.001 0.001 0.001 0.001 {built-in method init_builtin} 2 0.001 0.001 0.001 0.001 {method 'read' of '_io.FileIO' objects} 16 0.001 0.000 0.001 0.000 {built-in method stat} 1 0.001 0.001 0.001 0.001 {built-in method print} 2 0.000 0.000 0.000 0.000 {built-in method loads} ...

However, with a billion iterations, the program will still run slowly. Initially, I assumed that performance rests on the Piton random number generator, and rewrote the program so that the generator from the library library is called:

 from ctypes import cdll libc = cdll.msvcrt reshka = 0 orel = 0 for i in range(1000000): coin = libc.rand() % 2 if coin > 0: reshka += 1 else: orel += 1

But it did not give any performance boost:

  1753 function calls (1703 primitive calls) in 0.753 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.744 0.744 0.753 0.753 test.py:1(<module>) 2 0.002 0.001 0.002 0.001 {built-in method load_dynamic} 32 0.002 0.000 0.002 0.000 {built-in method stat} 37 0.001 0.000 0.001 0.000 {built-in method __build_class__} 1 0.001 0.001 0.001 0.001 {built-in method print} 4 0.001 0.000 0.001 0.000 {built-in method loads} 4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1031(get_data) 12 0.000 0.000 0.000 0.000 {built-in method OpenKey} 19 0.000 0.000 0.002 0.000 <frozen importlib._bootstrap>:1352(find_loader) 71 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:74(_path_join) 1 0.000 0.000 0.003 0.003 __init__.py:1(<module>) ...

Then I sketched a similar C program and measured the speed of its work:

 #include <stdio.h> #include <math.h> #include <stdlib.h> int main() { int heads = 0, tails = 0, i = 0; srand(0); for (i = 0; i < 1000000; i++) { if (rand() %2) { heads++; } else { tails++; } } printf("Heads: %d Tails: %d", heads, tails); return 0; }

Its execution took 0.045 seconds. With a billion iterations - 16.98 seconds. Hence the conclusion: in Python, the overhead of interpreting the program is too high. No matter how fast the critical sections of the program are executed, you still get a decent slowdown just because Python is an interpreted language. Perhaps code execution can be accelerated by using Cython or PyPy, but is it necessary? If yes - read this article , there is an introductory information on optimizing programs in Python.

@andreymal, CPython is the official name of the python implementation
@Etki standard implementation used by default and just slowing down in this matter.

ReinRaus ReinRaus 16k 3 gold marks 32 silver marks 77 bronze marks · Answer 3 · 2016-05-26T14:00:26

You can use the NumPy module to generate random numbers and thus reduce the time to about 20 seconds without using additional compilers.

 import numpy, time def test( n ): t1 = time.time() n2 = n/(10**7) # если нет сотни гигабайт оперативки, то придется считать частями reshka = 0 orel = 0 while n2>0: rnd = numpy.random.randint( 0, 2, (10**7) ) nonzero = numpy.count_nonzero( rnd ) orel += nonzero # орел - 1 , то есть не ноль reshka += 10**7-nonzero n2 -= 1 t2 = time.time() print( reshka, orel, t2-t1 ) test( 10**9 )

Result:

 500004947 499995053 19.711090803146362

That is 20 seconds without any special tricks except NumPy. At the same time, the execution time is commensurate with the result in C, in the answer for @ton

The code in my answer (without using additional compilers) is executed in 8 seconds.

Slow code execution - toss a coin a billion times

3 answers 3

More articles: