How to check the excel file for all labels (column label):

enter image description here

That is, for example, if there is no 14th label, the program added it to the column with the value 9999: enter image description here

There is a code that unloads these labels; now we need to check against the result file:

import pandas as pd file_name = r'C:\Users\fazliakhmetovRV\Documents\Python Scripts\primer.xlsx' cols = ['label','x','y','z','value'] df = pd.read_excel(file_name, sheetname='er', skiprows=4, header=None, parse_cols='C:XFD') dfs = [] for i in range(df.columns.size//5): lbl_col = 5*i x = df.ix[(df[lbl_col] != 0) & (df[lbl_col] != 9999), lbl_col:lbl_col+4] #x.columns = pd.MultiIndex.from_tuples(list(product([i+1], cols))) x.columns = cols dfs.append(x.reset_index(drop=True,level=1)) result = pd.concat(dfs, axis=1) result.to_excel('result.xlsx', index=False) 

It is necessary for all df to be checked on the label for the presence of all labels (1,2,3 ... n) (in this case there are 15) (Ie, there may be some labels in the file). And if the program does not find a specific label, then at the place where it should be written a line with this label with the value 9999

  • And how to understand which set of tags is a reference? The one in which the maximum number of tags? - MaxU
  • @MaxU yes, the maximum - Ramil Fazliahmetov

1 answer 1

Here is the corrected version:

 import pandas as pd file_name = r'C:\Users\fazliakhmetovRV\Documents\Python Scripts\primer.xlsx' out = r'C:\Users\fazliakhmetovRV\Documents\Python Scripts\out.xlsx' cols = ['label','x','y','z','value'] df = pd.read_excel(file_name, sheetname='er', skiprows=4, header=None, parse_cols='C:XFD') dfs = [] for i in range(df.columns.size//5): lbl_col = 5*i # filter each 5-column block x = df.ix[(df[lbl_col] != 0) & (df[lbl_col] != 9999), lbl_col:lbl_col+4] # set custom column names x.columns = cols # set 'label' column as index (will be used for alignment by `pd.concat()`) x.index = x.label dfs.append(x) # merge filtered DFs horizontally (aligned by indexes) result = pd.concat(dfs, axis=1) # replace all `label` columns with the "most complete" list of labels result.ix[:, ::5] = pd.concat([result.index.to_series()] * (len(df.columns)//5), axis=1) # replace all NaN's with `9999` result.fillna(9999, inplace=True) # save resulting DF to Excel result.to_excel(out, index=False) 

out.xlsx:

enter image description here

PS how DataFrame.ix works [...] :

 In [27]: df Out[27]: abc 0 4 9 8 1 8 4 5 2 7 9 4 3 4 7 1 4 0 7 3 5 8 1 4 6 1 9 8 7 4 9 3 8 1 3 8 9 7 9 0 

in order to filter (select) lines in which a > 1 and b < 8 :

 In [28]: df.ix[(df.a > 1) & (df.b < 8)] Out[28]: abc 1 8 4 5 3 4 7 1 5 8 1 4 

now the same, but select only the columns ['a','b'] :

 In [29]: df.ix[(df.a > 1) & (df.b < 8), ['a','b']] Out[29]: ab 1 8 4 3 4 7 5 8 1 
  • Thanks, helped) can you clarify the question? what does .ix do - does it filter values ​​by column? or how does he work at all?) - Ramil Fazliahmetov
  • @ RamilFazliahmetov, added a demo in response ... - MaxU