There is a data frame (see example). There are gaps in the dataframe series.

How to check for these gaps and test results to write to the new series d?

abcd 0 1 - - нет числа, нет числа 1 3 4 6 2 - 7 4 нет числа 3 6 5 - нет числа 

I started doing this:

  • walked through the cycle for each series and collected all the errors found in the list

  • the list is then translated into a string using join, although it may not need to be done

  • then applied this function to the data frame using the apply method.

As a result, I expected to receive in each row of the d series a list of omissions of the same row from the a, b and c series, but I get a KeyError error.

What to fix in the function or how else can you write?

    1 answer 1

    If there are minus signs in the DataFrame:

     In [27]: df Out[27]: abc 0 1 - - 1 3 4 6 2 - 7 4 3 6 5 - In [28]: df['d'] = [', '.join(['нет числа'] * x) for x in df.eq('-').sum(axis=1)] In [29]: df Out[29]: abcd 0 1 - - нет числа, нет числа 1 3 4 6 2 - 7 4 нет числа 3 6 5 - нет числа 

    step by step:

     In [178]: df.eq('-') Out[178]: abc 0 False True True 1 False False False 2 True False False 3 False False True 

    sum up True ( 1 ) line by line:

     In [179]: df.eq('-').sum(axis=1) Out[179]: 0 2 1 0 2 1 3 1 dtype: int64 

    An example of replicating a list and merging all items into a string:

     In [180]: ', '.join(['нет числа'] * 3) Out[180]: 'нет числа, нет числа, нет числа' 

    values ​​for the new column:

     In [181]: [', '.join(['нет числа'] * x) for x in df.eq('-').sum(axis=1)] Out[181]: ['нет числа, нет числа', '', 'нет числа', 'нет числа'] 

    if the DF passes ( NaN ):

     In [34]: df Out[34]: abc 0 1.0 NaN NaN 1 3.0 4.0 6.0 2 NaN 7.0 4.0 3 6.0 5.0 NaN In [35]: df['d'] = [', '.join(['нет числа'] * x) for x in df.isnull().sum(axis=1)] ## -- End pasted text -- In [36]: df Out[36]: abcd 0 1.0 NaN NaN нет числа, нет числа 1 3.0 4.0 6.0 2 NaN 7.0 4.0 нет числа 3 6.0 5.0 NaN нет числа 
    • MaxU, thanks! And what if there are different in each series, but beforehand known errors instead of the signs "-" and comments in the d series will also be different for the error of each series instead of the uniform "no number"? Still, if you can have a few questions on your decision? What does ',' .join (['no number'] * x? And .sum (axis = 1) accumulate text values? - Denis Novik
    • one
      @DenisNovik, I will add an explanation when I get to the computer. Can you explain what you are trying to do? Those. not this particular step, but a bit more globally ... - MaxU
    • I have a data frame, in series which in rows there are records like "Not assigned" or numbers that are less than 10,000. I wanted to write a function that will collect all errors in the line of each series and in human language in series d will tell users that in this line a series a needs to be fixed, in a series b it, and in a series c it. This is what I'm trying to implement - Denis Novik
    • Thank you very much for the explanations on my questions. As always, clearly and intelligibly! - Denis Novik
    • @DenisNovik, you can do more detailed checks for different cases, but this will require a reproducible example of the original DF and an example of what you want to get at the output. I think it would be better to issue this as a separate issue ... - MaxU