Group DataFrame by id

Question

Trying to work with Pandas, the question arose: I have two fields: id and info (tuple) How can I group by ID so that for each ID I have a list of tuples about this ID? Thank!

Please provide a small (3-5 lines) reproducible example of the original DataFrame and what you want to get at the output.
I also advise you to read: How to most effectively ask a question related to data processing and / or analysis (for example: by Pandas / Numpy / SciPy / SciKit Learn / SQL)?
There is a question to edit button - use it to change the question.

MaxU MaxU 52.4k 6 18 51 · Answer 1 · 2019-02-26T15:51:12

Source DataFrame:

In [50]: df Out[50]: id info 0 1 (a, a) 1 1 (a, b) 2 2 (b, a) 3 2 (b, b) 4 2 (b, c) 5 3 (c, a)

Decision:

 In [51]: res = df.groupby('id')['info'].apply(list)

Result:

 In [52]: res Out[52]: id 1 [(a, a), (a, b)] 2 [(b, a), (b, b), (b, c)] 3 [(c, a)] Name: info, dtype: object

or so:

 In [57]: res = df.groupby('id')['info'].apply(list).reset_index(name='info') In [58]: res Out[58]: id info 0 1 [(a, a), (a, b)] 1 2 [(b, a), (b, b), (b, c)] 2 3 [(c, a)]

when I try to write my error: cannot access callable attribute 'groupby' of 'dataframegroupby' objects, try using the 'apply' method
@Evgenia Kutuzov, it looks like you are trying to apply .groupby() to the result of another df.groupby() - this will not work.
To help you, I will need a reproducible data sample that will help reproduce the problem ...
then I do this: df ['info'] = df [['status', 'date']]. apply (tuple, axes = 1);
then I try what you wrote, but it does not group into a sheet for some reason, what am I doing wrong?

MaxU MaxU 52.4k 6 18 51 · Answer 2 · 2019-02-26T20:07:53

Another solution:

Source DF:

 In [12]: df Out[12]: id status date 0 1 status_1 2019-01-01 1 1 status_2 2019-01-02 2 2 status_3 2019-01-03 3 2 status_4 2019-01-04 4 2 status_5 2019-01-05 5 3 status_6 2019-01-06

Decision:

 res = (df.groupby('id') [['status','date']] .apply(lambda x: tuple(x.values)) .reset_index(name='info'))

Result:

 In [18]: res Out[18]: id info 0 1 ([status_1, 2019-01-01], [status_2, 2019-01-02]) 1 2 ([status_3, 2019-01-03], [status_4, 2019-01-04], [status_5, 2019-01-05]) 2 3 ([status_6, 2019-01-06],)

Group DataFrame by id

2 answers 2

More articles: