Check all values and add missing ones

Question

How to check the excel file for all labels (column label):

That is, for example, if there is no 14th label, the program added it to the column with the value 9999:

There is a code that unloads these labels; now we need to check against the result file:

import pandas as pd file_name = r'C:\Users\fazliakhmetovRV\Documents\Python Scripts\primer.xlsx' cols = ['label','x','y','z','value'] df = pd.read_excel(file_name, sheetname='er', skiprows=4, header=None, parse_cols='C:XFD') dfs = [] for i in range(df.columns.size//5): lbl_col = 5*i x = df.ix[(df[lbl_col] != 0) & (df[lbl_col] != 9999), lbl_col:lbl_col+4] #x.columns = pd.MultiIndex.from_tuples(list(product([i+1], cols))) x.columns = cols dfs.append(x.reset_index(drop=True,level=1)) result = pd.concat(dfs, axis=1) result.to_excel('result.xlsx', index=False)

It is necessary for all df to be checked on the label for the presence of all labels (1,2,3 ... n) (in this case there are 15) (Ie, there may be some labels in the file). And if the program does not find a specific label, then at the place where it should be written a line with this label with the value 9999

And how to understand which set of tags is a reference? The one in which the maximum number of tags? - MaxU

Accepted Answer · 2016-10-24T20:38:17

Here is the corrected version:

 import pandas as pd file_name = r'C:\Users\fazliakhmetovRV\Documents\Python Scripts\primer.xlsx' out = r'C:\Users\fazliakhmetovRV\Documents\Python Scripts\out.xlsx' cols = ['label','x','y','z','value'] df = pd.read_excel(file_name, sheetname='er', skiprows=4, header=None, parse_cols='C:XFD') dfs = [] for i in range(df.columns.size//5): lbl_col = 5*i # filter each 5-column block x = df.ix[(df[lbl_col] != 0) & (df[lbl_col] != 9999), lbl_col:lbl_col+4] # set custom column names x.columns = cols # set 'label' column as index (will be used for alignment by `pd.concat()`) x.index = x.label dfs.append(x) # merge filtered DFs horizontally (aligned by indexes) result = pd.concat(dfs, axis=1) # replace all `label` columns with the "most complete" list of labels result.ix[:, ::5] = pd.concat([result.index.to_series()] * (len(df.columns)//5), axis=1) # replace all NaN's with `9999` result.fillna(9999, inplace=True) # save resulting DF to Excel result.to_excel(out, index=False)

out.xlsx:

PS how DataFrame.ix works [...] :

 In [27]: df Out[27]: abc 0 4 9 8 1 8 4 5 2 7 9 4 3 4 7 1 4 0 7 3 5 8 1 4 6 1 9 8 7 4 9 3 8 1 3 8 9 7 9 0

in order to filter (select) lines in which a > 1 and b < 8 :

 In [28]: df.ix[(df.a > 1) & (df.b < 8)] Out[28]: abc 1 8 4 5 3 4 7 1 5 8 1 4

now the same, but select only the columns ['a','b'] :

 In [29]: df.ix[(df.a > 1) & (df.b < 8), ['a','b']] Out[29]: ab 1 8 4 3 4 7 5 8 1

Thanks, helped) can you clarify the question? what does .ix do - does it filter values by column? or how does he work at all?) - Ramil Fazliahmetov

Check all values and add missing ones

1 answer 1

More articles:

Check all values ​​and add missing ones

1 answer 1

More articles:

Check all values and add missing ones