I want to remove duplicates from an array by the first four values, while keeping the last value, by summing up with the remaining unique value. Those. There are many lines of this type:

1 3 1 2 5040 6 3 2 1 5040 1 3 1 2 320 8 3 3 0 1680 

I need to delete all the lines repeating the first four numbers, in this case it is 1 3 1 2 , and leave only one, but with the fifth value changing to the values ​​from the same column of the deleted lines, i.e. in the end, you should get this:

 1 3 1 2 5360 6 3 2 1 5040 8 3 3 0 1680 

I managed to remove duplicates of the first four values ​​in this way (pandas):

 datalst = pd.DataFrame(smlst3, columns = ["1","2","3","4","all"]) datalst = datalst.drop_duplicates(["1","2","3","4"],keep='first') 

But I do not understand how you can save the values ​​from the fifth column to a unique row.

    1 answer 1

    Simply group the frame into all columns except the column all and sum it up:

     In [63]: res = df.groupby(df.columns.drop('all').tolist(), as_index=False).sum() In [64]: res Out[64]: 1 2 3 4 all 0 1 3 1 2 5360 1 6 3 2 1 5040 2 8 3 3 0 1680