pd.read_csv () converted some columns to [float], although the CSV file had integer values

Question

There are 2 csv files with text and numeric columns. Numeric displays as integer digits. After concatenation, all numbers are displayed with .0 at the end, i.e. in the form of decimal. How to organize the merging of files so that the numbers remain whole? An example of concatenation is below:

print 'Concatenating with 2-file...' df1 = "1.csv" df2 = '2.csv' files = [pd.read_csv(df1, sep=','), pd.read_csv(df2, sep=',')] result = pd.concat(files, ignore_index=True) result.to_csv(df1, index=False) print 'Done!'

You can show the output of the following commands: pd.read_csv('1.csv').dtypes; pd.read_csv('2.csv').dtypes;
Item Title float64 Description Object SKU object MPN object UPC float64 Main Image URL object Quantity Float64 Float64 dtype: object
@MaxU, I understand you need to change the type of float64 to int?

Accepted Answer · 2018-03-15T14:16:56

This usually happens if you have at least one cell value - NaN (Not A Number) or at least one value of type float .

Example:

CSV file:

 text,a,b,c aa,1,10,100 bb,,20,200 cc,3,30,300.0

read:

 df = pd.read_csv(filename)

result:

 In [56]: df Out[56]: text abc 0 aa 1.0 10 100.0 1 bb NaN 20 200.0 2 cc 3.0 30 300.0 In [57]: df.dtypes Out[57]: text object a float64 b int64 c float64 dtype: object

How to check - show the number of NaN's for each column:

 In [58]: df.isnull().sum() Out[58]: text 0 a 1 b 0 c 0 dtype: int64

Decision:

Columns of types np.int* in Numpy / Pandas cannot contain NaN - they will be automatically converted to np.float_ if at least one NaN value is present in the column. To fix this, replace all NaN values with an integer:

 In [61]: df['a'] = df['a'].fillna(-1).astype(int) In [62]: df Out[62]: text abc 0 aa 1 10 100.0 1 bb -1 20 200.0 2 cc 3 30 300.0

If the problem is caused by a float value, then you can round the values in the column and convert to int :

 In [85]: df['c'] = [100, 200, 300.55] In [86]: df Out[86]: text abc 0 aa 1.0 10 100.00 1 bb NaN 20 200.00 2 cc 3.0 30 300.55 In [87]: df['c'] = df['c'].round().astype(int) In [88]: df Out[88]: text abc 0 aa 1.0 10 100 1 bb NaN 20 200 2 cc 3.0 30 301

or drop the fractional part by converting to int :

 In [89]: df['c'] = [100, 200, 300.55] In [90]: df['c'] = df['c'].astype(int) In [91]: df Out[91]: text abc 0 aa 1.0 10 100 1 bb NaN 20 200 2 cc 3.0 30 300

PS better to glue files like this:

 files = ["1.csv", "2.csv"] (pd.concat([pd.read_csv(f) for f in files], ignore_index=True) .to_csv(output_filename, index=False))

It depends on what caused the problem - NaN, or at least one float value ... What is your situation?
I apologize for the tediousness, but what to do in the case of NaN?
@ Dmitriy Vladimirovich, the first part of the decision is just about that ...

pd.read_csv () converted some columns to [float], although the CSV file had integer values

1 answer 1

More articles: