There is a df with thousands of object format columns that need to be formatted before converting to float . The problem is that sometimes there are not standard values that can be trimmed with str.replace , but such rare garbage values meaning completely different things than the essence of the column.
For example:
r_data_executions_blocks_0_items_118_sum NaN 10165 10 000 руб. 4 781 760 руб. 1 40 922 руб. 1 200 руб. 1 201 844 руб. 1 177 579 руб. 1 34198/15/50006-ИП 1 415 руб. 1 21148/18/86014-ИП 1 1,3 млн руб. 1 176 427 руб. 1 Name: r_data_executions_blocks_0_items_118_sum, dtype: int64 It is seen that the amount of type 10 000 руб. can be cleared of currency cuts and order of magnitude, to remove the space between digits. Values of the same type 34198/15/50006-ИП should be replaced entirely with “NaN” using the ИП pattern.
Type Code
for n in df_common_fin.columns: if 'sum' in n: df_common_fin[n] = np.where(df_common_fin[n].str.contains('ИП', regex=False), 'NaN', df_common_fin[n]) df_common_fin[n] = df_common_fin[n].str.contains('ИП', regex=False) df_common_fin[n] = df_common_fin[n].str.replace('руб', '') df_common_fin[n] = df_common_fin[n].str.replace('млн', '00000') df_common_fin[n] = df_common_fin[n].str.replace('млрд', '00000000') df_common_fin[n] = df_common_fin[n].str.replace('Исполнительный лист', "NaN") df_common_fin[n] = df_common_fin[n].str.replace('Сумма неизвестна', "NaN") df_common_fin[n].fillna('NaN', inplace=True) df_common_fin[n] = df_common_fin[n].str.replace('Нет', 'NaN') df_common_fin[n] = df_common_fin[n].str.replace('.', '') df_common_fin[n] = df_common_fin[n].str.replace(',', '') df_common_fin[n] = df_common_fin[n].str.replace(' ', '') df_common_fin[n] = df_common_fin[n].astype(np.float64) Does not work.