There are 2 csv files with text and numeric columns. Numeric displays as integer digits. After concatenation, all numbers are displayed with .0 at the end, i.e. in the form of decimal. How to organize the merging of files so that the numbers remain whole? An example of concatenation is below:

print 'Concatenating with 2-file...' df1 = "1.csv" df2 = '2.csv' files = [pd.read_csv(df1, sep=','), pd.read_csv(df2, sep=',')] result = pd.concat(files, ignore_index=True) result.to_csv(df1, index=False) print 'Done!' 
  • You can show the output of the following commands: pd.read_csv('1.csv').dtypes; pd.read_csv('2.csv').dtypes; pd.read_csv('1.csv').dtypes; pd.read_csv('2.csv').dtypes; ? - MaxU
  • Item Title float64 Description Object SKU object MPN object UPC float64 Main Image URL object Quantity Float64 Float64 dtype: object - Dmitry Vladimirovich
  • @MaxU, I understand you need to change the type of float64 to int? - Dmitry Vladimirovich

1 answer 1

This usually happens if you have at least one cell value - NaN (Not A Number) or at least one value of type float .

Example:

CSV file:

 text,a,b,c aa,1,10,100 bb,,20,200 cc,3,30,300.0 

read:

 df = pd.read_csv(filename) 

result:

 In [56]: df Out[56]: text abc 0 aa 1.0 10 100.0 1 bb NaN 20 200.0 2 cc 3.0 30 300.0 In [57]: df.dtypes Out[57]: text object a float64 b int64 c float64 dtype: object 

How to check - show the number of NaN's for each column:

 In [58]: df.isnull().sum() Out[58]: text 0 a 1 b 0 c 0 dtype: int64 

Decision:

Columns of types np.int* in Numpy / Pandas cannot contain NaN - they will be automatically converted to np.float_ if at least one NaN value is present in the column. To fix this, replace all NaN values ​​with an integer:

 In [61]: df['a'] = df['a'].fillna(-1).astype(int) In [62]: df Out[62]: text abc 0 aa 1 10 100.0 1 bb -1 20 200.0 2 cc 3 30 300.0 

If the problem is caused by a float value, then you can round the values ​​in the column and convert to int :

 In [85]: df['c'] = [100, 200, 300.55] In [86]: df Out[86]: text abc 0 aa 1.0 10 100.00 1 bb NaN 20 200.00 2 cc 3.0 30 300.55 In [87]: df['c'] = df['c'].round().astype(int) In [88]: df Out[88]: text abc 0 aa 1.0 10 100 1 bb NaN 20 200 2 cc 3.0 30 301 

or drop the fractional part by converting to int :

 In [89]: df['c'] = [100, 200, 300.55] In [90]: df['c'] = df['c'].astype(int) In [91]: df Out[91]: text abc 0 aa 1.0 10 100 1 bb NaN 20 200 2 cc 3.0 30 300 

PS better to glue files like this:

 files = ["1.csv", "2.csv"] (pd.concat([pd.read_csv(f) for f in files], ignore_index=True) .to_csv(output_filename, index=False)) 
  • Thank! Why this happens is intelligible. And, in fact, how to solve this problem? - Dmitry Vladimirovich
  • It depends on what caused the problem - NaN, or at least one float value ... What is your situation? - MaxU
  • In my case, the problem is in the float - Dmitry Vladimirovich
  • I apologize for the tediousness, but what to do in the case of NaN? - Dmitry Vladimirovich
  • @ Dmitriy Vladimirovich, the first part of the decision is just about that ... - MaxU