Function for line-by-line verification of conditional data frame

Question

There is a data frame (see example). There are gaps in the dataframe series.

How to check for these gaps and test results to write to the new series d?

abcd 0 1 - - нет числа, нет числа 1 3 4 6 2 - 7 4 нет числа 3 6 5 - нет числа

I started doing this:

walked through the cycle for each series and collected all the errors found in the list
the list is then translated into a string using join, although it may not need to be done
then applied this function to the data frame using the apply method.

As a result, I expected to receive in each row of the d series a list of omissions of the same row from the a, b and c series, but I get a KeyError error.

What to fix in the function or how else can you write?

Accepted Answer · 2018-04-10T21:20:18

If there are minus signs in the DataFrame:

 In [27]: df Out[27]: abc 0 1 - - 1 3 4 6 2 - 7 4 3 6 5 - In [28]: df['d'] = [', '.join(['нет числа'] * x) for x in df.eq('-').sum(axis=1)] In [29]: df Out[29]: abcd 0 1 - - нет числа, нет числа 1 3 4 6 2 - 7 4 нет числа 3 6 5 - нет числа

step by step:

 In [178]: df.eq('-') Out[178]: abc 0 False True True 1 False False False 2 True False False 3 False False True

sum up True ( 1 ) line by line:

 In [179]: df.eq('-').sum(axis=1) Out[179]: 0 2 1 0 2 1 3 1 dtype: int64

An example of replicating a list and merging all items into a string:

 In [180]: ', '.join(['нет числа'] * 3) Out[180]: 'нет числа, нет числа, нет числа'

values for the new column:

 In [181]: [', '.join(['нет числа'] * x) for x in df.eq('-').sum(axis=1)] Out[181]: ['нет числа, нет числа', '', 'нет числа', 'нет числа']

if the DF passes ( NaN ):

 In [34]: df Out[34]: abc 0 1.0 NaN NaN 1 3.0 4.0 6.0 2 NaN 7.0 4.0 3 6.0 5.0 NaN In [35]: df['d'] = [', '.join(['нет числа'] * x) for x in df.isnull().sum(axis=1)] ## -- End pasted text -- In [36]: df Out[36]: abcd 0 1.0 NaN NaN нет числа, нет числа 1 3.0 4.0 6.0 2 NaN 7.0 4.0 нет числа 3 6.0 5.0 NaN нет числа

And what if there are different in each series, but beforehand known errors instead of the signs "-" and comments in the d series will also be different for the error of each series instead of the uniform "no number"?
What does ',' .join (['no number'] * x? And .sum (axis = 1) accumulate text values?
@DenisNovik, I will add an explanation when I get to the computer.
I have a data frame, in series which in rows there are records like "Not assigned" or numbers that are less than 10,000. I wanted to write a function that will collect all errors in the line of each series and in human language in series d will tell users that in this line a series a needs to be fixed, in a series b it, and in a series c it.
@DenisNovik, you can do more detailed checks for different cases, but this will require a reproducible example of the original DF and an example of what you want to get at the output.
I think it would be better to issue this as a separate issue ...

Function for line-by-line verification of conditional data frame

1 answer 1

More articles: