The challenge: is it possible to split a set of positive integers into two parts with equal sums β this is the NP problem ( partition problem / number partitioning ), which is NP-complete . NP class means that fast solution algorithms are not known, but if a solution is found, then you can quickly check whether it is true. Although in practice, a solution can be found quickly for small numbers β it is the easiest of the most difficult / difficult tasks .
This is a special case of the sum of subsets problem , which in turn can be considered as a special case of the backpack problem .
The answer contains three solutions:
- Greedy algorithm (does not always work)
- Brute Force Method β Brute Force (O (n 2 n ))
- Solution using dynamic programming with pseudopolynomial time (O (nk))
Greedy algorithm (does not always work)
The code in question implements the greedy algorithm: take the largest number of those not yet selected and add to the part that has the sum now less:
def partition_equal_sums_greedy(numbers): parts, sums = ([], []), [0, 0] for largest in sorted(numbers, reverse=True): smaller = sums[1] < sums[0] parts[smaller].append(largest) sums[smaller] += largest if sums[0] != sums[1]: raise ValueError("Greedy algorithm has failed to find the" " number partitioning for {numbers!r}".format(**vars())) return parts
To find a solution for the example in question is enough
#XXX DO NOT DO IT return lst1 return lst2
replaced by:
return lst1, lst2
Example:
>>> partition_equal_sums([2, 4, 5, 6, 7, 12, 13, 23, 1, 1, 34]) ([34, 12, 5, 2, 1], [23, 13, 7, 6, 4, 1])
This is a simple fast algorithm, but as @pavel pointed out , the greedy algorithm does not always work for this task: for [5, 4, 3, 3, 3] it returns: ([5, 3], [4, 3, 3]) parts that have unequal amounts.
Here is another example from the link above , for which the greedy algorithm does not work:
771 121 281 854 885 734 486 1003 83 62
In the general case, to find a solution for n numbers you may need about 2 n steps (when the magnitude of the numbers is about 2 n ). For comparison: if the largest number in the input does not depend on n , then the greedy algorithm is linear β O(n) (when using O (n) sorting).
To tell the difference: for n = 1000 , a linear algorithm requires order 1000 of steps, while still 10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376 February 1000.
This is the reason why tasks from the NP class are considered difficult / complex (if NP! = P , then the exponential solution is the best that can be hoped for in the general case, if we do not consider non-deterministic solutions).
Brute Force Method β Brute Force
Here is a solution to a complete enumeration of all the options for splitting the input list into two parts and checking whether these parts have equal sums:
from itertools import compress, product def partition_equal_sums_brute_force(numbers): if sum(numbers) % 2 == 0: # even sum for selectors in product([0, 1], repeat=len(numbers)): if sum(compress(numbers, selectors)) == sum(numbers) // 2: return (list(compress(numbers, selectors)), list(compress(numbers, (not s for s in selectors)))) raise ValueError("Can't partition into two parts with equal sums" " for {numbers!r}".format(**vars()))
product([0, 1], repeat=len(numbers)) simply generates all binary numbers of len(numbers) length, for example, for len(numbers) == 3 : 000 001 010 011 100 101 110 111 (as list 0 , 1 digits). compress(numbers, selectors) returns those numbers from numbers , in those places where there are one ( 1 ) in selectors .
This is an O(n*2**n) algorithm that can be improved to O(2**n) , if you do not recalculate each time the amount for parts.
Example:
>>> partition_equal_sums([2, 4, 5, 6, 7, 12, 13, 23, 1, 1, 34]) ([7, 13, 34], [2, 4, 5, 6, 12, 23, 1, 1]) >>> partition_equal_sums([5, 4, 3, 3, 3]) ([3, 3, 3], [5, 4]) >>> partition_equal_sums([771, 121, 281, 854, 885, 734, 486, 1003, 83, 62]) ([121, 885, 486, 1003, 83, 62], [771, 281, 854, 734])
Although, in the worst case, random input may require exponential time (the deterministic case) to find a solution for all known algorithms for this problem, but in practice, for many inputs, a solution can be found much faster.
Solution using dynamic pseudo polynomial time programming
It is easy to know whether a given set of numbers can be split into two parts with equal amounts if the amount is small:
import functools def can_partition(numbers): @functools.lru_cache(maxsize=None) def p(s, j): return p(s, j - 1) or p(s - numbers[j - 1], j - 1) if j else (s == 0) n = len(numbers) s, odd = divmod(sum(numbers), 2) return not odd and p(s, n)
where p(s, j) true if the sum of some subset of numbers[:j] is equal to s . p(s, n) tells whether there exists a subset of numbers with a sum equal to s . For an even total, this means that you can split numbers into two parts with equal amounts.
This is the implementation of the O(kn) Python algorithm for the partitioning problem , which is a special case of the algorithm for the sum of the subsets (in Russian) .
Example:
>>> can_partition([1, 2]) False >>> can_partition([1, 2, 3]) True
To get the desired two lists from p(s, j) :
def partition_equal_sums(numbers): @functools.lru_cache(maxsize=None) def p(s, j): return p(s, j - 1) or p(s - numbers[j - 1], j - 1) if j else (s == 0) j = len(numbers) s, odd = divmod(sum(numbers), 2) if odd or not p(s, j): raise ValueError("Can't partition into two parts with equal sums" " for {numbers!r}".format(**vars())) left, right = [], [] while s: while p(s, j): # sum(some_subset(numbers[:j])) == s j -= 1 right.append(numbers[j]) s -= right[-1] # sum(some_subset(numbers[:j])) == s - numbers[j] left.append(right.pop()) return left, numbers[:j] + right
functools.lru_cache(maxsize=None) caches the returned values ββof p(s,j) , which allows us to obtain an O(NK) (pseudopolynomial) algorithm. Using p(s,j) , he looks for an index j such that:
sum(some_subset(numbers[:j+1])) == s sum(some_subset(numbers[:j])) != s
that is, without the numbers[j] element, the previous numbers (index <j ) the sum of s cannot dial. Therefore, numbers[j] is chosen as a term and the cycle is repeated for s - numbers[j] sums, while the remaining sum is not zero. By construction, sum(left) == sum(numbers) // 2 and since this code is executed only for an even total sum ( sum(numbers) % 2 == 0 ), then sum(right) == sum(numbers) // 2 and the solution is found ( right contains all elements from numbers that are not in the left list and sum(left) == sum(right) ).