Dataframe df available

enter image description here

print(type(df.index[0])) <class 'pandas._libs.tslibs.timestamps.Timestamp'> 

From the max_high and max_rsi_14 columns, you need to get two equal in length to the data series. In the first series there should be all non-empty values ​​from max_high, and in the second value of max_rsi_14, only those closest in index to a non-empty value of max_high .

Example of the desired total:

enter image description here


I tried to solve with

 df.index.get_loc(dt, method='nearest') 

But I could not figure it out. Please help and thanks for the help!


Dataset example

  high low volume open ... min_high max_high min_rsi_14 max_rsi_14 timestamp ... 2018-10-02 11:00:00 6553.0 6542.5 25845962 6552.0 ... NaN NaN NaN NaN 2018-10-02 12:00:00 6574.0 6534.0 99602546 6542.5 ... NaN 6574.0 NaN 44.320898 2018-10-02 13:00:00 6557.0 6542.0 42124982 6552.0 ... NaN NaN NaN NaN 2018-10-02 14:00:00 6552.5 6497.0 136742416 6549.5 ... NaN NaN 34.712926 NaN 2018-10-02 15:00:00 6529.5 6505.5 55167626 6519.0 ... NaN NaN NaN 39.25009 

    1 answer 1

    You can use pd.merge_asof () :

     res = pd.merge_asof(df.loc[df['max_high'].notna(), 'max_high'], df.loc[df['max_rsi_14'].notna(), 'max_rsi_14'], left_index=True, right_index=True, direction='nearest') 

    Example:

     In [36]: df Out[36]: max_high max_rsi_14 timestamp 2018-10-02 11:00:00 NaN NaN 2018-10-02 12:00:00 6574.0 44.320898 2018-10-02 13:00:00 NaN NaN 2018-10-02 14:00:00 6666.0 NaN 2018-10-02 15:00:00 NaN 39.250090 In [37]: %paste res = pd.merge_asof(df.loc[df['max_high'].notna(), 'max_high'], df.loc[df['max_rsi_14'].notna(), 'max_rsi_14'], left_index=True, right_index=True, direction='nearest') ## -- End pasted text -- In [38]: res Out[38]: max_high max_rsi_14 timestamp 2018-10-02 12:00:00 6574.0 44.320898 2018-10-02 14:00:00 6666.0 39.250090 
    • one
      Thank you so much! As always very helpful. Already reading about the function =) - Dmitry Sharko