I have a strange groupby behavior. The impression is that he ignores unsorted items.

 >>> from itertools import groupby >>> arr = [1,2,3,1,2,3,1,2,3] >>> groups = groupby(arr) >>> {x: list(y) for x, y in groups} {1: [1], 2: [2], 3: [3]} >>> arr.sort() >>> groups = groupby(arr) >>> {x: list(y) for x, y in groups} {1: [1, 1, 1], 2: [2, 2, 2], 3: [3, 3, 3]} 

Why it happens?

PS: version 3.6.0

3 answers 3

The documentation about this is written: The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes. The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes.

Those. each time you stumble, for example, on 1, your key is reset, therefore, in your case, you have one value for an unsorted list.

 from itertools import groupby arr = [1,2,3,1,2,3,1,1,2,3] groups = groupby(arr) print({x: list(y) for x, y in groups}) 

You will get: {1: [1, 1], 2: [2], 3: [3]}

  • Thank. That is, without a grade - nowhere? - Torkvemada
  • Yes, without sorting is not enough. - Avernial

The documentation says:

Generally, sorted on the same key function.

itertools.groupby

That is, the pre-sequence must be sorted using the same key as the grouping. Because groupby groups elements with the same key arranged in series.

This can be seen explicitly in the following code:

 >>> from itertools import groupby >>> arr = [1,2,3,1,1,2,2,3,3,1,1,1,2,2,2,3,3,3] >>> groups = groupby(arr) >>> [x: list(y) for x, y in groups] [(1, [1]), (2, [2]), (3, [3]), (1, [1, 1]), (2, [2, 2]), (3, [3, 3]), (1, [1, 1, 1]), (2, [2, 2, 2]), (3, [3, 3, 3])] 

    groupby() * groups adjacent consecutive identical values:

     >>> from itertools import groupby >>> for _, group in groupby("aabbaaabbb"): ... print(*group) aa bb aaa bbb 

    You only see the last groups for each key, since you write the result in the dictionary ( {1: [1], 1: [2]} == {1: [2]} ).

    If you want all the same values ​​to be in the same group, then either you need to sort the input:

     >>> for _, group in groupby(sorted("aabbaaabbb")): ... print(*group) aaaaa bbbbb 

    or use other approaches, for example, based on dictionaries that do not pay attention to the order of elements (sorting is not needed):

     >>> from collections import Counter >>> Counter("aabbaaabbb") Counter({'a': 5, 'b': 5}) # элемент и количество повторений 

    * The first sentence in the groupby() description:

    Make an iterator that returns the keys and groups from the iterable. my selection