I have a DataFrame that contains references to rows in another DataFrame in list format.

I need to collect a new data frame from the specified rows and add another new column to them summarizing the values ​​of the first DataFrame.

To make it clearer, made the scheme. I need to collect a DataFrame like the one on the bottom.

enter image description here

# DataFrame c данными data = { 'apples': [3, 2, 1, 4, 5, 0, 4, 2, 1], 'oranges': [3, 0, 4, 2, 1, 2, 3, 7, 2], 'tomat': [1, 1, 4, 2, 8, 6, 4, 7, 2] } df = pd.DataFrame(data) # DataFrame c указанием номеров строк в первом maps = { 'cat': [3, 0, 4], 'sklad': [1, 5, 2], 'vec': [[3, 2, 1], [4, 0], [1, 5, 3, 2]] } dfv = pd.DataFrame(maps) 

I began to do everything through cycles and conditions, and I understand that this is highly redundant and unstable. Tell me, how is it correct to choose vectors based on lists in Pandas and add more calculated values?

  • can you give the output DF as text or as a code (like df or dfv in question)? - MaxU pm

1 answer 1

Use the explode() function :

 In [21]: t = explode(dfv, 'vec') In [22]: t Out[22]: cat sklad vec 0 3 1 3 1 3 1 2 2 3 1 1 3 0 5 4 4 0 5 0 5 4 2 1 6 4 2 5 7 4 2 3 8 4 2 2 In [23]: res = df.loc[t['vec']] In [24]: res Out[24]: apples oranges tomat 3 4 2 2 2 1 4 4 1 2 0 1 4 5 1 8 0 3 3 1 1 2 0 1 5 0 2 6 3 4 2 2 2 1 4 4 

It is necessary to add the columns "cat" and "sklad" to the "res" data frame.

 In [50]: res = df.loc[t['vec']].join(t.drop('vec', axis=1)) In [51]: res Out[51]: apples oranges tomat cat sklad 0 3 3 1 3 1 1 2 0 1 3 1 1 2 0 1 3 1 2 1 4 4 3 1 2 1 4 4 3 1 3 4 2 2 0 5 3 4 2 2 0 5 4 5 1 8 0 5 5 0 2 6 4 2 
  • There is the second part of the question. It is necessary to add the columns "cat" and "sklad" to the "res" data frame. I tried res = df.loc [t ['vec', 'cat', 'sklad']] like that, but I get the error KeyError: ('vec', 'cat', 'sklad'). How to add columns correctly? - Mavar