This usually happens if you have at least one cell value - NaN (Not A Number) or at least one value of type float .
Example:
CSV file:
text,a,b,c aa,1,10,100 bb,,20,200 cc,3,30,300.0
read:
df = pd.read_csv(filename)
result:
In [56]: df Out[56]: text abc 0 aa 1.0 10 100.0 1 bb NaN 20 200.0 2 cc 3.0 30 300.0 In [57]: df.dtypes Out[57]: text object a float64 b int64 c float64 dtype: object
How to check - show the number of NaN's for each column:
In [58]: df.isnull().sum() Out[58]: text 0 a 1 b 0 c 0 dtype: int64
Decision:
Columns of types np.int* in Numpy / Pandas cannot contain NaN - they will be automatically converted to np.float_ if at least one NaN value is present in the column. To fix this, replace all NaN values with an integer:
In [61]: df['a'] = df['a'].fillna(-1).astype(int) In [62]: df Out[62]: text abc 0 aa 1 10 100.0 1 bb -1 20 200.0 2 cc 3 30 300.0
If the problem is caused by a float value, then you can round the values in the column and convert to int :
In [85]: df['c'] = [100, 200, 300.55] In [86]: df Out[86]: text abc 0 aa 1.0 10 100.00 1 bb NaN 20 200.00 2 cc 3.0 30 300.55 In [87]: df['c'] = df['c'].round().astype(int) In [88]: df Out[88]: text abc 0 aa 1.0 10 100 1 bb NaN 20 200 2 cc 3.0 30 301
or drop the fractional part by converting to int :
In [89]: df['c'] = [100, 200, 300.55] In [90]: df['c'] = df['c'].astype(int) In [91]: df Out[91]: text abc 0 aa 1.0 10 100 1 bb NaN 20 200 2 cc 3.0 30 300
PS better to glue files like this:
files = ["1.csv", "2.csv"] (pd.concat([pd.read_csv(f) for f in files], ignore_index=True) .to_csv(output_filename, index=False))
pd.read_csv('1.csv').dtypes; pd.read_csv('2.csv').dtypes;pd.read_csv('1.csv').dtypes; pd.read_csv('2.csv').dtypes;? - MaxU