Colleagues, please tell me: 1. I read .csv (semicolon separator). We have to play with the encoding, as there is a Russian text 2. Grouped and summed 3. Instead of the sum, I get concatenation

Question: instead of concatenation, you need a normal amount (.csv = https://transfiles.ru/kq9g5 ), (.xlsx = https://transfiles.ru/5t38n )

Listing:

import xlwt import xlrd import csv import codecs import openpyxl from openpyxl import Workbook from openpyxl.utils.dataframe import dataframe_to_rows import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('C:\py3\Test1\Test2.csv', sep=';', encoding = "866") df.head(10) #Создаём файл Pivot.xlsx df1 = df.groupby('Global Dimension 2 Code')['Amount'].sum() df1.to_csv('C:\py3\Test1\Pivot.csv') df11 = pd.read_csv('C:\py3\Test1\Pivot.csv') w11 = pd.ExcelWriter('C:\py3\Test1\Pivot.xlsx') df11.to_excel(w11, sheet_name='Pivot', index=False, engine='xlsxwriter') w11.save() 
  • can you put CSV on any file sharing service and give an example of the output DataFrame in the answer? - MaxU
  • Added link to .csv - 2b4fITin
  • one
    Once concatenation, it is recognized as strings. Need to convert to numbers. - Enikeyschik February
  • I agree. Just do not know how. If you tell me, it would be super. Tried, but not the whole line, but only the values ​​from the "Amount" column - 2b4fITin
  • Thank you so much, MaxU! Please help with the following: 1. Added link to .xlsx; Questions: a. why the first line is different from those that go below it (you need the 1st line to be the same as everyone else); b. Why didn’t they add up (do they need to be added up)? - 2b4fITin

1 answer 1

The column Amount perceived as a string, because By default, a period ( '.' ) is used as a separator between the integer and fractional parts, and a comma ( ',' ) is used for your data.

Specify decimal=',' explicitly:

 In [18]: df = pd.read_csv(r'C:\download\Test2.csv', sep=';', decimal=',') # NOTE: ------------------------------------------------> ^^^^^^^^^^^ In [19]: df.groupby('Global Dimension 2 Code', as_index=False)['Amount'].sum() Out[19]: Global Dimension 2 Code Amount 0 L01 -32338.00 1 L02 -3619.59 2 L03 -1268.08 

UPDATE:

 In [28]: (df.assign(Global_Dimension_2_Code=df['Global Dimension 2 Code'].fillna('EmptyAmount')) .groupby('Global_Dimension_2_Code', as_index=False) ['Amount'].sum()) Out[28]: Global_Dimension_2_Code Amount 0 EmptyAmount -24646.81 1 L01 -32338.00 2 L02 -3619.59 3 L03 -1268.08 

UPDATE2:

 df = pd.read_csv(r'C:\download\Test2.csv', sep=';', decimal=',') res = (df.assign(Global_Dimension_2_Code=df['Global Dimension 2 Code'].fillna('EmptyAmount')) .groupby('Global_Dimension_2_Code', as_index=False) ['Amount'].sum()) res.to_excel(r'c:\temp\result.xlsx', index=False) 
  • Need more help: a. why the first line is different from those that go below it (you need the 1st line to be the same as everyone else); b. Why didn’t they add up (do they need to be added up)? - 2b4fITin
  • @ 2b4fITin, what should the empty category be called? - MaxU
  • Let it be called EmptyAmount - 2b4fITin 1:02 pm
  • one
    You have already answered MaxU by adding as_index = False in the code. I have already looked -)) - 2b4fITin
  • one
    Reject the edit there, the mouse slipped, the wrong pipu clicked. - 0xdb 1:43 pm