I have a csv file with two columns: user_id and started_at. Values ​​started_at is the date of purchase - once (maximum 2) times per month for one id. I want to somehow compare id by month. For this, I thought to make a table with id and columns with dates by months. How better to turn this thing? no matter how I try to do something on a primitive, somehow everything is not very good. Here is the option that I still have, but this option is eaten part of the data.

dat.index=pd.to_datetime(dat['started_at']) dat5=dat[:'2015-05-31'] dat6=dat['2015-06-01':'2015-06-30'] dat7=dat['2015-07-01':'2015-07-31'] dat8=dat['2015-08-01':] dat5.index=dat5['user_id'] dat6.index=dat6['user_id'] dat7.index=dat7['user_id'] dat8.index=dat8['user_id'] data=dat6.merge(dat5, 'right', on='user_id') data1=dat7.merge(data, 'right', on='user_id') data2=dat8.merge(data1, 'right', on='user_id') data2 
  • What do you mean by "compare id by month"? Get a report, what id in which months, how many times did you make a purchase, choose the top id, make purchases, or something else? The task is not completely clear. Further, I would upload / import this csv file to the database, and already in it I made selections using SQL - lospejos
  • I also originally wanted to do this (import to the database), but there I also had a snag. but about the report - I need data for each id when he makes a purchase, in order to understand later who fell off in which month, and who came and who pays every month. - Katia Nahornaya
  • 1. You can import a csv file into the database using the same Excel, or any means for working with the DBMS (Dbeaver, DBVisualizer, etc. depending on which database you will import. I recommend some lightweight free - FireBird , PostgreSQL, SQLite). 2. I still do not understand what you need in the end, I think you need to describe what reports you want to get at the output. - lospejos
  • I tried to stuff in postgreSQL, I created a table, but I still can't import. - Katia Nahornaya
  • one
    And what is the difficulty with importing to the database? Ask a question in which describe the problem of import. You can build a large list of dictionaries with the structure [{id: date}, ...] and then process them in a loop, but it is better to take the same SQLAlchemy using SQLite (create_engine ("sqlite: ///: memory:")) - import the data into the temporary database and process it there. - Tihon

1 answer 1

Suppose we have a trace. DataFrame:

 In [4]: df Out[4]: id started_at 0 1 2015-01-22 1 1 2016-01-01 2 1 2016-01-09 3 2 2016-01-11 4 3 2016-01-30 5 1 2016-02-02 6 2 2016-02-03 7 3 2016-03-03 8 1 2016-03-01 9 1 2016-03-03 10 3 2016-04-04 

We use the pivot_table () method to calculate the number of purchases for each customer ( id ) per month and convert the month value to the column name ( pivot ) for better visibility:

 In [6]: (df.assign(mon=df.started_at.dt.to_period('M')) ...: .pivot_table(index='id', columns='mon', aggfunc='size', fill_value=0) ...: .reset_index() ...: .rename_axis(None, 1) ...: ) ...: Out[6]: id 2015-01 2016-01 2016-02 2016-03 2016-04 0 1 1 2 1 2 0 1 2 0 1 1 0 0 2 3 0 1 0 1 1 

The DataFrame.assign () method creates a new virtual column.