Instead of summarizing, Pandas combines values into one string.

Question

Colleagues, please tell me: 1. I read .csv (semicolon separator). We have to play with the encoding, as there is a Russian text 2. Grouped and summed 3. Instead of the sum, I get concatenation

Question: instead of concatenation, you need a normal amount (.csv = https://transfiles.ru/kq9g5 ), (.xlsx = https://transfiles.ru/5t38n )

Listing:

import xlwt import xlrd import csv import codecs import openpyxl from openpyxl import Workbook from openpyxl.utils.dataframe import dataframe_to_rows import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('C:\py3\Test1\Test2.csv', sep=';', encoding = "866") df.head(10) #Создаём файл Pivot.xlsx df1 = df.groupby('Global Dimension 2 Code')['Amount'].sum() df1.to_csv('C:\py3\Test1\Pivot.csv') df11 = pd.read_csv('C:\py3\Test1\Pivot.csv') w11 = pd.ExcelWriter('C:\py3\Test1\Pivot.xlsx') df11.to_excel(w11, sheet_name='Pivot', index=False, engine='xlsxwriter') w11.save()

can you put CSV on any file sharing service and give an example of the output DataFrame in the answer?
Tried, but not the whole line, but only the values from the "Amount" column
why the first line is different from those that go below it (you need the 1st line to be the same as everyone else);

Accepted Answer · 2019-02-21T12:40:33

The column Amount perceived as a string, because By default, a period ( '.' ) is used as a separator between the integer and fractional parts, and a comma ( ',' ) is used for your data.

Specify decimal=',' explicitly:

 In [18]: df = pd.read_csv(r'C:\download\Test2.csv', sep=';', decimal=',') # NOTE: ------------------------------------------------> ^^^^^^^^^^^ In [19]: df.groupby('Global Dimension 2 Code', as_index=False)['Amount'].sum() Out[19]: Global Dimension 2 Code Amount 0 L01 -32338.00 1 L02 -3619.59 2 L03 -1268.08

UPDATE:

 In [28]: (df.assign(Global_Dimension_2_Code=df['Global Dimension 2 Code'].fillna('EmptyAmount')) .groupby('Global_Dimension_2_Code', as_index=False) ['Amount'].sum()) Out[28]: Global_Dimension_2_Code Amount 0 EmptyAmount -24646.81 1 L01 -32338.00 2 L02 -3619.59 3 L03 -1268.08

UPDATE2:

 df = pd.read_csv(r'C:\download\Test2.csv', sep=';', decimal=',') res = (df.assign(Global_Dimension_2_Code=df['Global Dimension 2 Code'].fillna('EmptyAmount')) .groupby('Global_Dimension_2_Code', as_index=False) ['Amount'].sum()) res.to_excel(r'c:\temp\result.xlsx', index=False)

why the first line is different from those that go below it (you need the 1st line to be the same as everyone else);
You have already answered MaxU by adding as_index = False in the code.
Reject the edit there, the mouse slipped, the wrong pipu clicked.

Instead of summarizing, Pandas combines values into one string.

1 answer 1

More articles:

Instead of summarizing, Pandas combines values ​​into one string.

1 answer 1

More articles:

Instead of summarizing, Pandas combines values into one string.