Problem with for loop and searching for Nan values

Question

You need to load data from the table and delete columns that contain the value Nan. Here is my code:

import pandas import numpy as np data = pandas.read_csv('TrueOrFalse.csv') X = np.array([data['1'], data['2'], data['3'], data['4']]) for i in X[2]: if np.isnan(X[2][i]) == 'true': X[0][i] = X[0][i+1] X[1][i] = X[1][i+1] X[2][i] = X[2][i+1] X[3][i] = X[3][i+1] else: i += 1

Gives an error message:

 IndexError Traceback (most recent call last) <ipython-input-70-d2187077755d> in <module>() 11 12 for i in X[2]: ---> 13 if np.isnan(X[2][i]) == 'true': 14 X[0][i] = X[0][i+1] 15 X[1][i] = X[1][i+1] IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Solution: data = data.dropna() . Through the loop you can and do not go through the array, it is easier to load data from a pre-sorted DataFrame.

I also suspect that you didn’t expect a value in i , but an index of an element in an array, that is, you meant for i in range(len(X[2])) (but I don’t know if it works with numpy, I haven't tried it)

Answer 1 · 2017-09-06T13:27:08

In my opinion, this can be done more elegantly:

 data = pandas.read_csv('TrueOrFalse.csv') data = data.loc[:, data.notnull().all()]

As a result, all columns containing at least one NaN will be deleted.

Beautiful and elegant, but I need to use the array further in DecisionTreeClassifier()
DecisionTreeClassifier like most of the SKLearn methods, works great with Pandas DataFrame's directly ... ;-) If you still need a Numpy array , then data.values will return the corresponding Numpy array

Problem with for loop and searching for Nan values

1 answer 1

More articles: