You need to load data from the table and delete columns that contain the value Nan. Here is my code:

import pandas import numpy as np data = pandas.read_csv('TrueOrFalse.csv') X = np.array([data['1'], data['2'], data['3'], data['4']]) for i in X[2]: if np.isnan(X[2][i]) == 'true': X[0][i] = X[0][i+1] X[1][i] = X[1][i+1] X[2][i] = X[2][i+1] X[3][i] = X[3][i+1] else: i += 1 

Gives an error message:

 IndexError Traceback (most recent call last) <ipython-input-70-d2187077755d> in <module>() 11 12 for i in X[2]: ---> 13 if np.isnan(X[2][i]) == 'true': 14 X[0][i] = X[0][i+1] 15 X[1][i] = X[1][i+1] IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices 

Solution: data = data.dropna() . Through the loop you can and do not go through the array, it is easier to load data from a pre-sorted DataFrame.

  • Can you TrueOrFalse.csv upload somewhere and give a link? - Alexander Bragin
  • Why do you compare the result of an isnan with a string? - andreymal
  • Table to unfortunately can not unload - Andrey Stebenkov
  • For sure! Good that you saw a mistake. Thank you, corrected) - Andrey Stebenkov
  • I also suspect that you didn’t expect a value in i , but an index of an element in an array, that is, you meant for i in range(len(X[2])) (but I don’t know if it works with numpy, I haven't tried it) - andreymal

1 answer 1

In my opinion, this can be done more elegantly:

 data = pandas.read_csv('TrueOrFalse.csv') data = data.loc[:, data.notnull().all()] 

As a result, all columns containing at least one NaN will be deleted.

  • Beautiful and elegant, but I need to use the array further in DecisionTreeClassifier() - Andrey Stebenkov
  • DecisionTreeClassifier like most of the SKLearn methods, works great with Pandas DataFrame's directly ... ;-) If you still need a Numpy array , then data.values will return the corresponding Numpy array - MaxU