How to multiply each element of a csv file by a number using Python?

Question

There is a csv file containing 8 columns and a huge number of lines, only numeric data. What is the easiest way to double the value of each cell?

How to multiply all the elements of a file by a number, I did not find.
I do not program on python, I need only a part of its tooling for my tasks.
I have the code, I just need to enter two or three lines, which can be used to multiply.
I can write them myself, the question is different, how can I do this optimally - multiply one line at a time or somehow?
It would be enough to refer to any resource where such functionality is presented.

Accepted Answer · 2016-09-14T13:04:08

Using Pandas, it is extremely simple and much faster than solutions using cycles:

import pandas as pd pd.read_csv('/path/to/file.csv').mul(2).to_csv('/path/to/result.csv', index=False)

Or, if your CSV does not have column names:

 (pd.read_csv('/path/to/file.csv', header=None) .mul(2) .to_csv('/path/to/result.csv', header=None, index=False) )

If all data does not fit in the memory, Pandas allows you to process the data in chunks:

 for chunk in pd.read_csv('/path/to/file.csv', header=None, chunksize=10**6): chunk.mul(2).to_csv('/path/to/result.csv', header=None, index=False, mode='a')

Test

Generate test data:

 In [80]: x = pd.DataFrame(np.random.rand(10**6, 8)) In [81]: x.shape Out[81]: (1000000, 8) In [82]: x.head() Out[82]: 0 1 2 3 4 5 6 7 0 0.890351 0.814333 0.436384 0.477630 0.394325 0.339082 0.205078 0.033898 1 0.749606 0.045757 0.850344 0.496043 0.727086 0.717089 0.086069 0.826920 2 0.703955 0.492983 0.105174 0.298543 0.008731 0.885632 0.233774 0.046960 3 0.053236 0.024155 0.126744 0.139082 0.475896 0.747354 0.772720 0.816470 4 0.439865 0.344010 0.301929 0.528090 0.462750 0.113415 0.621818 0.796230 In [83]: x.to_csv('c:/temp/file.csv', index=False, header=None)

Size C:\Temp\file.csv - 148MiB

Measuring speed on my regular (non SSD) drive:

 In [84]: %%timeit ....: (pd.read_csv('c:/temp/file.csv', header=None) ....: .mul(2) ....: .to_csv('c:/temp/result.csv', header=None, index=False) ....: ) ....: 1 loop, best of 3: 30.4 s per loop

Result: 30 seconds to read the original CSV, multiply all elements and write the result back to CSV for 1 million rows and 8 columns.

@irvis if the number of lines is "huge" (such that all numbers do not fit in memory), then the file will have to be processed in parts (for example, line by line — if the task is limited by disk performance (IO bound), then it is not particularly important how you multiply these lines ( you can read at least one number ) —in the whole, pandas is convenient for working with numeric tables.

Community spirit ♦ one · Answer 2 · 2016-09-15T09:55:58

If you need to perform a task using only the standard library (otherwise pandas as @MaxU demonstrated, is ideally suited for this task):

 #!/usr/bin/env python3 import fileinput import sys with fileinput.FileInput(inplace=True, backup='.bak') as file: for line in file: try: # multiply whitespace-separated float numbers print(*[2*float(f) for f in line.split()]) except ValueError as e: print("Can't parse %r line, error: %s" % (line, e), file=sys.stderr)

This code modifies the files specified on the command line, or stdin. Each source file is moved to a *.bak file in case of an error, and stdout is redirected to the source file to make changes on the spot.

So I tried, but, frankly, did not have enough knowledge to complete.

How to multiply each element of a csv file by a number using Python?

2 answers 2

More articles: