Comparison of two lists for finding elements that comply with the rules

Question

The question is that for example I have two lists:

lst1 = ['1', '2' , '3' , '4'] lst2 = ['123', '234' , '345' , '334']

How can I find such elements in the second list, which include only those elements that are in the first, but if the first list has one unit, then for example the element "112" from the second list does not fit.

To be the result of the program should be

 ls3 = ['123', '234']

'345' - did not fit because there is an element "5" which is not in the first list

'334' - did not fit because there are two elements “3”, and in the first list element “3” is only one

Related question: Check if two unordered lists are equal - jfs
association: stackoverflow.com/questions/9623114/… - Nicolas Chabanovsky ♦

Community spirit ♦ one · Accepted Answer · 2016-11-15T00:30:38

If you know how to determine if you can make a word string from the given chars characters, using each character in chars no more than its number of repetitions (the can_build(word, chars) is known as the can_build(word, chars) predicate), then the problem comes down to:

 result = list(filter(can_build, lst2))

or more read:

 result = [word for word in lst2 if can_build(word)]

where can_build uses chars = lst1 inside.

If it were required to use all the characters from chars , then it would be a check whether the word an anagram of chars , for example: “enlightener” can be obtained by rearranging the letters “patience” . Similar solutions can be used when exact equality is replaced with "no more".

can_build() can be implemented by finding whether a word multiset a subset of the chars multiset. If all the characters in chars and word were unique, then

 can_build = set("1234").issuperset

collections.Counter implements the idea of a set in which elements can be repeated, that is, multisets . As shown in the elegant solution in the @Timofey Bondarev response , you can use this collection to implement can_build :

 can_build = lambda word, chars=Counter(lst1): not (Counter(word) - chars)

You can implement the same algorithm manually without using collections.Counter .

 from collections import defaultdict def Counter(letters): counts = defaultdict(int) for letter in letters: counts[letter] += 1 return counts chars_count = Counter(chars) def can_build(word): return all(chars_count[char] >= count for char, count in Counter(word).items())

You can use a simple list, if all the characters belong to an alphabet, then since the chars the same all the time, you can chars.count values. For example, if chars can contain only the numbers 0-9 :

 from string import digits chars_count = [(digit, chars.count(digit)) for digit in digits] def can_build(word): return all(word.count(digit) <= count for digit, count in chars_count)

this is an O(N * M) solution ( M=len(digits) —the size of the alphabet), in contrast to the O(N) solution that uses Counter() . If the alphabet is not fixed: alphabet = set(word) , then this is an O(N**2) (quadratic) algorithm. If the alphabet fixed as in the example, then this is an O(N) (linear) solution. For a small alphabet, for example, for boolean numbers ( alphabet=(0,1) ) or DNA strings ( alphabet="GTAC" ), this solution could be even faster than solutions with Counter() .

Another application example: if word , chars are numbers represented by their simple factors (for example: (2,2,3) represents 12 , (5,7) represents 35 ), then can_build() answers the question whether the word divider chars , that is, is it true that: chars % word == 0 .

Q: how to implement the code if you get rid of the condition that the element should be repeated as many times as it is in the first list ???

Already the answer in this assumption works. Otherwise it would be an anagram case: the number of repetitions is the same. Solutions with Counter() and with <= , >= NOT require the element to be repeated "as many times as it is in the first list" (less is possible).

If you mean that the number of repetitions is not important at all, but only whether there is an element or not, then the situation is similar to the case when all elements are unique, that is:

 can_build = set(lst1).issuperset

as already mentioned above.

I read your answer today, and he really very well explains the whole point of the task.
But I wondered how to implement the code if I got rid of the condition that the element should be repeated as many times as it is in the first list ???
@Bernard, in particular, set(chars).issuperset(word) says “ can one make a word string of the given chars characters ” —this is the method of the standard set class.
The @Bernard issuperset can be used to determine the can_build predicate (as explicitly shown in the response).
Then you can use the can_build function to find the result already — look at the first two code examples in the response.

Timofei Bondarev Timofei Bondarev 6,930 3 32 51 · Answer 2 · 2016-11-12T18:21:01

This problem can be solved using the standard class for multisets Counter :

 from collections import Counter lst1 = ['1', '2' , '3' , '4'] lst2 = ['123', '234' , '345' , '334'] base = Counter(lst1) result = [s for s in lst2 if not (Counter(s) - base)]

The condition not (Counter(s) - base) checks that there are no more elements in the multiset s than in base

I am very sorry, but I have not had to work with 'Counter', so if it’s not difficult for you to tell in more detail what the last two lines are doing.
Counter counts the number of times each value from the list that you pass to its constructor.
@Bernard: not container is true for empty containers in Python.
When subtracting from Counter() , only the items with a positive score remain as a result.
As a result, if the difference is not empty, then the element from the second list contains numbers that are not in the first list.

Answer 3 · 2016-11-13T12:01:43

 [l2 for l2 in lst2 if all(l2.count(l) <= lst1.count(l) for l in set(l2))]

dio4 dio4 101 7 · Answer 4 · 2016-11-16T09:03:09

 #!/usr/bin/env python3.4 # -*- coding: utf-8 -*- lst1 = ['1', '2' , '3' , '4'] lst2 = ['123', '234' , '345' , '334'] S1 = set(lst1) S2 = set(lst2) S3 = set() for x in S2: if set(x) <= S1: S3.add(x) lst3 = list(S3) for x in lst3: for y in x: if x.count(y) > 1: count = lst3.index(x) continue lst3.pop(count) print(lst3)

If just according to the logic of things, without any special wisdom. A little aware of the sets (from school), cycles and lists. Answer ['123', '234']

A standard set in python can only contain unique elements, so for another set lst1 (with repeated elements), using set will result in the wrong decision.
Want to say something - write the code .. I gave a workable code.
And as you know (and it has already become a classic of the genre - “Chatter is worthless. Show me the code. - Linus Torvalds”.
From whether I show the code or not, the behavior of the standard set in python will not change, alas.
I'm not talking about the behavior of the set, but about your conversations.
You try to set lst1 = [1,2,3,3,4] or lst1=[1,2,3,3] and tell later what your “workable” code is.

Answer 5 · 2016-11-13T02:30:09

The solution is higher, of course, in short, but humanly it can be done like this:

 список = ['1', '2', '3', '4'] список2 = ['123', '234', '345', '34'] результат = [] for сочетание in список2: временный_список = список[:] # копия списка for цифра in сочетание: if цифра in временный_список: временный_список.remove(цифра) # чтобы не более одной проверки на число else: break else: результат.append(сочетание) print(результат) ===================== RESTART: C:\Python 3.5\задачка.py ===================== ['123', '234', '34']

Comparison of two lists for finding elements that comply with the rules

5 answers 5

More articles: