There are two initial DF for 2015 and 2016 respectively:

FIRM x1 x2 x3 A 5 6 8 A 6 6 4 A 5 6 4 B 6 6 5 B 6 6 5 C 5 6 4 D 6 4 5

and

FIRM x1 x2 x3 B 5 6 8 C 6 6 4 F 5 6 4 B 6 6 5 B 6 6 5 A 5 6 4 A 6 4 5 This task continues: Merge frames, leave only those records that are found in all frames ( in a specific column) On these two examples, it is necessary to show how the various filtering methods work with: filter() . Specifically, what is the lambda x: x['YEAR'].nunique()==len(items) and how does it work?

    1 answer 1

    The easiest way to show how df.groupby(...).filter(...) works on a single DataFrame.

    NOTE: in the previous answer, grouping and filtering are applied to the result of applying the pd.concat() function, which in turn merges several frames into one.

    Source DF:

     In [62]: df Out[62]: FIRM x1 x2 x3 0 A 5 6 8 1 A 6 6 4 2 A 5 6 4 3 B 6 6 5 4 B 6 6 5 5 C 5 6 4 6 D 6 4 5 

    Suppose we want to get data only for those firms for which there are 2+ lines:

     In [63]: df.groupby('FIRM').filter(lambda z: len(z) > 1) Out[63]: FIRM x1 x2 x3 0 A 5 6 8 1 A 6 6 4 2 A 5 6 4 3 B 6 6 5 4 B 6 6 5 

    The .filter() method applies a function (in our case, this is a “lambda” function) to each group (the group is a subset of DataFrame with the same structure as that of df ). The function in the .filter(func) method should return a boolean scalar ( True or False )

    • When we write groupby('FIRM') , then z will be so many rows from this FIRM column? - Max52
    • no z is a subset of df with the same structure (with all columns) - in z will be data for one firm. We tried to make df.groupby('FIRM').apply(lambda z: print(z)) as I advised you before - everything seems to be clear there ... - MaxU
    • Yes, I tried, he divides as if into a bunch of different tables according to unique values. - Max52
    • correct, so you print all the groups on the screen (one group for each group). The first group can be printed twice - do not pay attention to it - this is the specificity of the internal implementation of the .apply() method - MaxU
    • And the team then looks at each of the groups and finds this condition? lambda z: len(z) > 1 ? - Max52