Trying to work with Pandas, the question arose: I have two fields: id and info (tuple) How can I group by ID so that for each ID I have a list of tuples about this ID? Thank!
- onePlease provide a small (3-5 lines) reproducible example of the original DataFrame and what you want to get at the output. I also advise you to read: How to most effectively ask a question related to data processing and / or analysis (for example: by Pandas / Numpy / SciPy / SciKit Learn / SQL)? - MaxU pm
- There is a question to edit button - use it to change the question. Comments can not be formatted data or code - MaxU
|
2 answers
Source DataFrame:
In [50]: df Out[50]: id info 0 1 (a, a) 1 1 (a, b) 2 2 (b, a) 3 2 (b, b) 4 2 (b, c) 5 3 (c, a)
Decision:
In [51]: res = df.groupby('id')['info'].apply(list)
Result:
In [52]: res Out[52]: id 1 [(a, a), (a, b)] 2 [(b, a), (b, b), (b, c)] 3 [(c, a)] Name: info, dtype: object
or so:
In [57]: res = df.groupby('id')['info'].apply(list).reset_index(name='info') In [58]: res Out[58]: id info 0 1 [(a, a), (a, b)] 1 2 [(b, a), (b, b), (b, c)] 2 3 [(c, a)]
- Yes, you understood the question correctly; when I try to write my error: cannot access callable attribute 'groupby' of 'dataframegroupby' objects, try using the 'apply' method - Eugene
- @Evgenia Kutuzov, it looks like you are trying to apply
.groupby()
to the result of anotherdf.groupby()
- this will not work. To help you, I will need a reproducible data sample that will help reproduce the problem ... - MaxU pm - Unfortunately, I now have no access to my computer :( - Evgenia
- I will try to explain this - Evgenia
- Initially, I have the fields Id, status and date; then I do this: df ['info'] = df [['status', 'date']]. apply (tuple, axes = 1); del df [status]; del df [date ']; then I try what you wrote, but it does not group into a sheet for some reason, what am I doing wrong? - Eugene
|
Another solution:
Source DF:
In [12]: df Out[12]: id status date 0 1 status_1 2019-01-01 1 1 status_2 2019-01-02 2 2 status_3 2019-01-03 3 2 status_4 2019-01-04 4 2 status_5 2019-01-05 5 3 status_6 2019-01-06
Decision:
res = (df.groupby('id') [['status','date']] .apply(lambda x: tuple(x.values)) .reset_index(name='info'))
Result:
In [18]: res Out[18]: id info 0 1 ([status_1, 2019-01-01], [status_2, 2019-01-02]) 1 2 ([status_3, 2019-01-03], [status_4, 2019-01-04], [status_5, 2019-01-05]) 2 3 ([status_6, 2019-01-06],)
|