How do pandas get the most by excluding some of the columns from the calculations?

Question

There is a correlation matrix. I need to get maximum ignoring certain columns. To do this, I attached another column to the matrix with labels as a filter for filtering such columns.

The task is to calculate the maximum, excluding from the calculation of the row, where in the column "CLASS" is one. Since the matrix is square, the "CLASS" column can also be used as a string.

Below is the code for my implementation. It works correctly, but very slowly on large tables! Help me find a quick way to calculate vector pandas. I do not have enough memory to work with large dataframes with this approach.

df = pd.read_csv('https://st.storeland.ru/9/2418/212/demo10.csv', sep=';', index_col=0) def noise_porog(Series_cor): noise_list = list() priznaki = list(df['CLASS']) CLASS = 1 for idx, crl in enumerate(Series_cor[:-1]): if priznaki[idx] != CLASS: noise_list.append(Series_cor[idx]) # список шумов else: pass return pd.Series(max(noise_list)) df['max_01'] = df.apply(noise_porog, axis=1)

Here is a screen for clarity. I selected the first iteration, where I select the necessary columns and from them I get the maximum and write to the new column "max_01":

Here is a larger matrix:

 df = pd.read_csv('https://st.storeland.ru/6/2418/067/demo.csv', sep=',', index_col=-1) df = df.loc[:, ~df.columns.str.contains('unnamed', case=False)].T df

For convenience, I made the values from "CLASS" as column names.

Answer 1 · 2019-02-09T22:08:00

Try this:

 df = pd.read_csv('https://st.storeland.ru/9/2418/212/demo10.csv', sep=';', index_col=0) mask = df['CLASS']!=1 df['max_01'] = df.loc[:, df.columns.drop('CLASS',1)[mask]].max(axis=1)

Result:

 In [63]: df Out[63]: 2 3 4 5 6 7 8 9 10 11 CLASS max_01 2 0.000000 0.107562 0.202508 0.082104 0.099218 0.166363 -0.138255 -0.030342 0.040025 0.236721 0 0.236721 3 0.107562 0.000000 0.069416 0.213758 0.167404 0.137428 -0.048976 0.056551 0.039009 0.270039 1 0.270039 4 0.202508 0.069416 0.000000 0.056688 0.302428 0.090878 0.032381 0.120947 0.414783 0.117498 -1 0.414783 5 0.082104 0.213758 0.056688 0.000000 0.247694 0.171819 -0.028765 0.157801 0.184200 0.465918 0 0.465918 6 0.099218 0.167404 0.302428 0.247694 0.000000 -0.096407 -0.184963 0.198542 0.222838 0.190360 -1 0.302428 7 0.166363 0.137428 0.090878 0.171819 -0.096407 0.000000 0.056020 0.144441 0.105880 0.119886 -1 0.171819 8 -0.138255 -0.048976 0.032381 -0.028765 -0.184963 0.056020 0.000000 -0.051127 0.027271 -0.050593 0 0.056020 9 -0.030342 0.056551 0.120947 0.157801 0.198542 0.144441 -0.051127 0.000000 0.212784 -0.019487 1 0.212784 10 0.040025 0.039009 0.414783 0.184200 0.222838 0.105880 0.027271 0.212784 0.000000 0.146514 -1 0.414783 11 0.236721 0.270039 0.117498 0.465918 0.190360 0.119886 -0.050593 -0.019487 0.146514 0.000000 -1 0.465918

PS It is better not to add the CLASS column as a DataFrame column, but simply store it separately as numpy.ndarray or as Pandas.Series .

As I understand it, your solution just finds the maximum, but I need to learn how to exclude some columns from the calculations using the condition.
How to calculate, for example, a maximum of columns whose index is divisible by 3 without a remainder?
@Mavar, do you mean by index the column ordinal number or its name?
And I need to count the maxima on the basis of some condition.
I impose it on the matrix and, on the basis of its values, choose which rows to count the maximum or which to skip.
Try replacing column names with values from the CLASS column.

How do pandas get the most by excluding some of the columns from the calculations?

1 answer 1

More articles: