Counting transactions in a specific time period

Question

Colleagues, help to form a DataFrame based on a given condition.

Source DataFrame available:

ID №Policy Request Request date Decision 123 23ff 10000 2018-01-28 11:36 0 123 23ff 10000 2018-01-29 10:00 5000 123 42rd 25000 2018-06-18 15:10 25000 123 42rd 30000 2018-08-18 18:00 30000 345 23ff 15000 2018-01-28 12:00 10000 345 27fg 50000 2018-09-30 17:35 0 345 81er 30000 2018-09-30 10:15 10000 345 81er 30000 2018-10-20 11:30 10000 678 12rt 55000 2018-12-01 09:25 0 678 12rt 55000 2018-12-15 12:00 45000

It is necessary to count the number of decisions (Decisions) taken for each ID in the frame No. Policy, however with the following restriction - if the decision on the ID within the same No. Policy was made within one month several times, then that decision is 1 ( i.e., within a month, one ID within one #Policy may have several solutions 2, 3 or more - if everything is done within one month, then regardless of the number of requests, you must assume that this is 1 solution).

The result should be approximately as follows

 ID №Policy Request Request date Decision count 123 23ff 10000 2018-01-28 11:36 0 0 123 23ff 10000 2018-01-29 10:00 5000 1 123 42rd 25000 2018-06-18 15:10 25000 1 123 42rd 30000 2018-08-18 18:00 30000 1 345 23ff 15000 2018-01-28 12:00 10000 1 345 27fg 50000 2018-09-30 17:35 0 1 345 81er 30000 2018-09-30 10:15 10000 0 345 81er 30000 2018-10-20 11:30 10000 1 678 12rt 55000 2018-12-01 09:25 0 0 678 12rt 55000 2018-12-15 12:00 45000 1

What algorithm to register here mind I will not put: (

can you explain why in the resulting DF in the first line of count: 0 , and in the sixth: count: 1 ?
in the first line 0, because after 1 day it was decided to repeat - No. Policy and ID coincide .... in the sixth line it was decided that the loan was not approved (0) - but the decision was made and because of this it is considered like 1 solution.
The very essence is as follows - if a credit decision on the same client within the same contract (No. Policy) was made several times within 30 days, then it should be considered as 1 decision ...
it would be much easier to aggregate the lines so that in the end there is one line for each ID , NPolicy , Request_month
In the current formulation of the problem, this is difficult to implement, because the logic for calculating count different.
If we always started count from 1 and in all subsequent lines for the same ID and NPolicy we would put 0 for the same month. Then the logic would be the same and implement such logic - easier

MaxU MaxU 52.4k 6 18 51 · Accepted Answer · 2019-02-21T11:05:49

If I understand the question correctly:

 In [209]: df['count'] = (df.groupby(['ID','NPolicy',pd.Grouper(key='Request_date', freq='MS')]) ['Decision'] .cumcount().eq(0).astype('int')) In [210]: df Out[210]: ID NPolicy Request Request_date Decision count 0 123 23ff 10000 2018-01-28 11:36:00 0 1 1 123 23ff 10000 2018-01-29 10:00:00 5000 0 2 123 42rd 25000 2018-06-18 15:10:00 25000 1 3 123 42rd 30000 2018-08-18 18:00:00 30000 1 4 345 23ff 15000 2018-01-28 12:00:00 10000 1 5 345 27fg 50000 2018-09-30 17:35:00 0 1 6 345 81er 30000 2018-09-30 10:15:00 10000 1 7 345 81er 30000 2018-10-20 11:30:00 10000 1 8 678 12rt 55000 2018-12-01 09:25:00 0 1 9 678 12rt 55000 2018-12-15 12:00:00 45000 0

And such a question if I want to change the period, say, take not within a month, but within 1.5 months or say 20 days ... how is it possible to do this?
here is the complete table "offset aliases" that can be used in the freq parameter.
You can also specify the number of periods, for example freq='10D' or freq='2W' .
If you can't figure it out - ask a new question here - let's try to figure it out together;)

Counting transactions in a specific time period

1 answer 1

More articles: