Trying to work with Pandas, the question arose: I have two fields: id and info (tuple) How can I group by ID so that for each ID I have a list of tuples about this ID? Thank!

2 answers 2

Source DataFrame:

In [50]: df Out[50]: id info 0 1 (a, a) 1 1 (a, b) 2 2 (b, a) 3 2 (b, b) 4 2 (b, c) 5 3 (c, a) 

Decision:

 In [51]: res = df.groupby('id')['info'].apply(list) 

Result:

 In [52]: res Out[52]: id 1 [(a, a), (a, b)] 2 [(b, a), (b, b), (b, c)] 3 [(c, a)] Name: info, dtype: object 

or so:

 In [57]: res = df.groupby('id')['info'].apply(list).reset_index(name='info') In [58]: res Out[58]: id info 0 1 [(a, a), (a, b)] 1 2 [(b, a), (b, b), (b, c)] 2 3 [(c, a)] 
  • Yes, you understood the question correctly; when I try to write my error: cannot access callable attribute 'groupby' of 'dataframegroupby' objects, try using the 'apply' method - Eugene
  • @Evgenia Kutuzov, it looks like you are trying to apply .groupby() to the result of another df.groupby() - this will not work. To help you, I will need a reproducible data sample that will help reproduce the problem ... - MaxU pm
  • Unfortunately, I now have no access to my computer :( - Evgenia
  • I will try to explain this - Evgenia
  • Initially, I have the fields Id, status and date; then I do this: df ['info'] = df [['status', 'date']]. apply (tuple, axes = 1); del df [status]; del df [date ']; then I try what you wrote, but it does not group into a sheet for some reason, what am I doing wrong? - Eugene

Another solution:

Source DF:

 In [12]: df Out[12]: id status date 0 1 status_1 2019-01-01 1 1 status_2 2019-01-02 2 2 status_3 2019-01-03 3 2 status_4 2019-01-04 4 2 status_5 2019-01-05 5 3 status_6 2019-01-06 

Decision:

 res = (df.groupby('id') [['status','date']] .apply(lambda x: tuple(x.values)) .reset_index(name='info')) 

Result:

 In [18]: res Out[18]: id info 0 1 ([status_1, 2019-01-01], [status_2, 2019-01-02]) 1 2 ([status_3, 2019-01-03], [status_4, 2019-01-04], [status_5, 2019-01-05]) 2 3 ([status_6, 2019-01-06],)