Finding max () and min () in a CSV file column

Question

Given a file object .txt, in which the data is given as a number:

Rosneft, 07/19 / 06.00: 00,220.32,220.32,203.0303.35.51774.0

There are about 2 thousand such data and all of them are recorded in a file from a new line.
It is necessary to find the maximum and minimum price and its date in the file.

file = open('file.txt') # Открыть файл for line in file: # Пройти циклом new_line1 = line.split(',') #разделить строки по запятым в new_line1 new_file1 = new_line1[4:6] # срезать нужные строки

And on this the logic of the task is at a standstill. I would like to understand this logic in the examples both with the function min () and max (), and without it.

Then make a cut like this: new_line1[3:] , will return ['220.32', '220.32', '203.03', '203.95', '51774', '0']
@ gil9red Thanks for the idea. It’s probably better to do this: new_line1[3:-1] , since 0 will always be the minimum of the sequence.

MaxU MaxU 52.5k 6 18 51 · Accepted Answer · 2018-11-06T13:02:56

Solution without the use of add. modules:

Data file ( C:\Temp\data.csv ):

 Rosneft,07/19/06,00:00,220.32,220.32,203.03,203.95,51774,0 Rosneft,07/20/06,00:00,230.33,230.34,230.35,230.36,56544,0 Rosneft,07/21/06,00:00,210.33,210.34,210.35,210.36,50344,0

Decision:

 filename = r'C:\Temp\data.csv' data = [] with open(filename) as f: for line in f: tmp = line.split(',') tmp[3:] = list(map(float, tmp[3:])) data.append(tmp) def transpose(matrix): return list(zip(*matrix)) def get_min_idx(data, col_idx=0): return min(range(len(data)), key=transpose(data)[col_idx].__getitem__) def get_max_idx(data, col_idx=0): return max(range(len(data)), key=transpose(data)[col_idx].__getitem__) print('Min:\t', data[get_min_idx(data, col_idx=3)]) print('Max:\t', data[get_max_idx(data, col_idx=3)])

Result:

 Min: ['Rosneft', '07/21/06', '00:00', 210.33, 210.34, 210.35, 210.36, 50344.0, 0.0] Max: ['Rosneft', '07/20/06', '00:00', 230.33, 230.34, 230.35, 230.36, 56544.0, 0.0]

Great, thanks a lot! Now I’ll deal with all this) - SinCap

Answer 2 · 2018-11-06T12:25:30

Pandas module is ideal for processing tabular (2D) data.

Example:

Let's create a test file of a similar structure, as a data, take a quote from Apple starting in 2001 (4,490 lines)

 import pandas as pd # pip install pandas from pandas_datareader.data import DataReader # pip install pandas-datareader df = DataReader('AAPL', 'yahoo', '2001-01-01', '2018-11-06').reset_index() df.to_csv('c:/temp/data.csv', index=False)

Several lines from the file:

 Date,High,Low,Open,Close,Volume,Adj Close 2001-01-02,1.0892857313156128,1.0401785373687744,1.0625,1.0625,113078000.0,0.713999330997467 2001-01-03,1.1919642686843872,1.03125,1.0357142686843872,1.1696428060531616,204268400.0,0.7859991192817688 2001-01-04,1.3214285373687744,1.2008928060531616,1.2957571744918823,1.21875,184849000.0,0.818999171257019 2001-01-05,1.2410714626312256,1.1473214626312256,1.2098214626312256,1.1696428060531616,103089000.0,0.7859991192817688 2001-01-08,1.2131643295288086,1.1383928060531616,1.2098214626312256,1.1830357313156128,93424800.0,0.7949992418289185

Decision:

 import pandas as pd df = pd.read_csv(r'c:/temp/data.csv')

Search for rows with the lowest and highest values in the Adj Close field:

 In [27]: print(df.nsmallest(1, ['Adj Close'])) Date High Low Open Close Volume Adj Close 573 2003-04-17 0.946429 0.908571 0.942857 0.937143 154064400.0 0.629759 In [28]: print(df.nlargest(1, ['Adj Close'])) Date High Low Open Close Volume Adj Close 4466 2018-10-03 233.470001 229.779999 230.050003 232.070007 28654800.0 232.070007

TOP 5 values:

 In [29]: print(df.nlargest(5, ['Adj Close'])) Date High Low Open Close Volume Adj Close 4466 2018-10-03 233.470001 229.779999 230.050003 232.070007 28654800.0 232.070007 4465 2018-10-02 230.000000 226.630005 227.250000 229.279999 24788200.0 229.279999 4445 2018-09-04 229.179993 226.630005 228.410004 228.360001 27390100.0 228.360001 4467 2018-10-04 232.350006 226.729996 230.779999 227.990005 32042000.0 227.990005 4444 2018-08-31 228.869995 226.000000 226.509995 227.630005 43340100.0 227.630005

@ MaxUThis is certainly a cool thing, but I don’t understand anything at all, because I am at the initial stage of learning a language. But thanks, of course, for your work and feedback!
At the beginning of my answer, I gave a link: 10 Minutes to pandas - this is not as difficult as it may seem at first glance ...;) But having mastered Pandas, a new world of simple and very effective data processing and analysis will open up for you
@ MaxU I can't disagree with you, libraries and modules are very cool! But I want to first understand the "basic" concepts of the language, understand the logic of its work and writing algorithms.

Answer 3 · 2018-11-06T11:01:09

The logic is:

 line="Rosneft,07/19/06,00:00,220.32,220.32,203.03,203.95,51774,0" elems=line.split(',') date=elems[1] prices = list(map(float,elems[3:7])) print("На {} максимум: {}, минимум: {}".format(date, max(prices), min(prices)))

At the exit:

 На 07/19/06 максимум: 220.32, минимум: 203.03

Use min / max not on lines, but on numbers ( float ), otherwise you can get surprises
@strawdog Thank you! Only I need to find these values in the WHOLE sequence (2 thousand items in the list) and display the date with these values.
@SinCap Do I understand correctly that I need to find the maximum and minimum for the entire file, and not for each line separately?
It is necessary to go through the entire file and find the highest and the lowest price among all values, and the dates at which these two values were present.

Finding max () and min () in a CSV file column

3 answers 3

More articles: