Deleting columns in a csv file using python

Question

I have a csv data file (7 columns and 6063 rows). The column names are something like ['id', 'seller', 'buyer', 'timestamp'] . And the corresponding data in the rows. You need to clear this file from the lines where the seller = buyer.

 import pandas as pd data=pd.read_csv('file.csv', sep=';', decimal=',') dat=pd.DataFrame(data.T) for i in dat: if dat[dat.columns[i]][1]==dat[dat.columns[i]][2]: a=dat.columns[i]

I get something like this, but with the removal of columns (now these are already columns) I have a problem, since these are not columns that go in order, but I don’t really want to list the name of 1450 columns. Tell me how to do better here?

Accepted Answer · 2016-09-27T18:21:16

Use the .query () method:

 data = pd.read_csv('file.csv', sep=';', decimal=',', quotechar="'").query('seller != buyer')

If you need to save back to CSV:

 data.to_csv('output.csv', index=False)

PS you do not need to transpose the DataFrame in order to filter it

PPS If you use Pandas, try not to use for loop - this is not very effective

Here is a working example , taking into account the fact that your CSV uses ' as quoting quote:

CSV file - D:\temp\buyer_seller.csv :

 'id';'seller';'buyer';'timestamp' 1;seller-1;buyer-1;2016-01-01 2;seller-2;buyer-2;2016-01-02 3;same-1;same-1;2016-01-11 4;same-2;same-2;2016-01-22

Code:

 In [21]: pd.read_csv(r'D:\temp\buyer_seller.csv', sep=';') Out[21]: 'id' 'seller' 'buyer' 'timestamp' 0 1 seller-1 buyer-1 2016-01-01 1 2 seller-2 buyer-2 2016-01-02 2 3 same-1 same-1 2016-01-11 3 4 same-2 same-2 2016-01-22 In [22]: pd.read_csv(r'D:\temp\buyer_seller.csv', sep=';', quotechar="'") Out[22]: id seller buyer timestamp 0 1 seller-1 buyer-1 2016-01-01 1 2 seller-2 buyer-2 2016-01-02 2 3 same-1 same-1 2016-01-11 3 4 same-2 same-2 2016-01-22 In [23]: pd.read_csv(r'D:\temp\buyer_seller.csv', sep=';', quotechar="'").query('seller != buyer') Out[23]: id seller buyer timestamp 0 1 seller-1 buyer-1 2016-01-01 1 2 seller-2 buyer-2 2016-01-02

Alternatively, you can simply get rid of the quotes in the column / column names:

 In [27]: df = pd.read_csv(r'D:\temp\buyer_seller.csv', sep=';') In [28]: df.columns.tolist() Out[28]: ["'id'", "'seller'", "'buyer'", "'timestamp'"] In [30]: df.columns = df.columns.str.replace("'", '') In [31]: df.columns.tolist() Out[31]: ['id', 'seller', 'buyer', 'timestamp']

Everything would be fine, a great idea, only the column names in the file are actually recorded as 'seller' and 'buyer', which in this case is not perceived.
A huge list of errors and at the end this is UndefinedVariableError: name 'seller' is not defined
My decision is based on the fact that the columns are actually called: buyer and seller .
In this column, the numbers are something like 1469502678. How can this be converted to a real date / time?

Deleting columns in a csv file using python

1 answer 1

More articles: