Probabilities = {'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25} def WeightedDie(Probabilities): import random r = random.uniform(0, 1) letter = '' for symbol in "ACGT": if r >= 0 and r <= Probabilities['A']: letter = 'A' elif r >= Probabilities['A'] and r <= (Probabilities['A'] + Probabilities['C']): letter = 'C' elif r >= (Probabilities['A'] + Probabilities['C']) and r <= (Probabilities['A'] + Probabilities['C'] + Probabilities['G']): letter = 'G' elif r >= (Probabilities['A'] + Probabilities['C'] + Probabilities['G']) and r <= (Probabilities['A'] + Probabilities['C'] + Probabilities['G'] + Probabilities['T']): letter = 'T' return letter 

I need a function for randomly selecting events with given probabilities. When I set the intervals, I get the data on the key. However, the keys can be called differently, not necessarily ACGT, and then my function will not work. That's the problem.

  • It is not clear what you want. Clarify the question. - user194374
  • use the pep-8 naming convention if there is no particular reason in your case for the contrary. - jfs

2 answers 2

Comments to your code:

  1. You have random numbers in the range from 0 to 1, so you do not need to check that r >= 0 (this condition is obviously fulfilled)
  2. If the condition r <= x not fulfilled, then after that in elif no longer necessary to check that r > x (you have r >= x ).
  3. You can get by simply looping over key-value pairs with accumulating a value with which you need to compare a random number at each stage. If a random number falls into the range, then immediately return the corresponding key.

My version of the implementation:

 import random probabilities = {'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25} def WeightedDie(prob): r = random.uniform(0, 1) x = 0 for letter, p in prob.items(): x += p if r <= x: return letter print(WeightedDie(probabilities)) 

In practice, it is better to use a ready-made implementation, for example, the option proposed by @jfs .

  • better is random.random() instead of random.uniform(0,1) and r < x (strict inequality, without the right border). By the way, in general, you can add without loss of accuracy using an algorithm similar to math.fsum() - jfs

To select a random value with an uneven distribution defined by specified weights, you can use the weighted_choice(weights) function :

 def WeightedDie(Probabilities): #XXX non pep-8 names! letters = list(Probabilities) weights = Probabilities.values() return letters[weighted_choice(weights)] 

This is an O(n) in memory and time approach (which for n=4 is quite likely to be quite effective). In the general case, with a large n , each time not to add all the weights, one can calculate the partial sums once as shown in the documentation for the random module :

 >>> import itertools >>> weighted_choices = {'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25} >>> choices, weights = zip(*weighted_choices.items()) >>> cumdist = list(itertools.accumulate(weights)) >>> cumdist [0.25, 0.5, 0.75, 1.0] 

Then, each choice requires only O(log n) instead of O(n) steps, using the bisect module that implements a binary search on a sorted sequence:

 >>> import bisect >>> import random >>> x = random.random() * cumdist[-1] >>> choices[bisect.bisect(cumdist, x)] 'G' 

In Python 3.6, it can be written as random.choices(choices, weights)[0] or what is the same, using the notation from the code in question:

 import random def WeightedDie(Probabilities): return random.choices(*zip(*Probabilities.items()))[0] 

You can query several values ​​at a time (pass a k named parameter) and explicitly set cum_weights so that you do not calculate them again.

When discussing the implementation of random.choices() , more complex methods were also considered, such as the alias method , which allows obtaining random values ​​in O(1) after O(n log n) or O(n) initialization.