How to select text from a file between two known lines?

Question

There is a text file, it is necessary to choose from it a certain number of lines, the text is always different, but there are always two known lines in advance, but the text is before them, between them and after not. Well, as an example:

Я вас любил: любовь еще, быть может, В душе моей угасла не совсем; Но пусть она вас больше не тревожит; Я не хочу печалить вас ничем. Я вас любил безмолвно, безнадежно, То робостью, то ревностью томим; Я вас любил так искренно, так нежно, Как дай вам бог любимой быть другим. :)

and from this let’s let us know the lines: “But don't let it bother you anymore;” and "That shyness, then jealousy we torment;". It is necessary to take them and what is between them, the rest is unnecessary for us.

not the most effective solution, but quite working excerpt = re.search(r'(Но пусть она вас больше не тревожит.*То робостью, то ревностью томим;)'s, text).group(1)

Community spirit ♦ one · Answer 1 · 2017-02-02T02:47:11

If known strings are given in start , end variables, then to get a slice consisting of these strings and everything in between:

 result = text[text.index(start):text.index(end)+len(end)]

Assuming that end is found in text only after start .

For example:

 >>> text = '..abc..' >>> start = 'a' >>> end = 'c' >>> text[text.index(start):text.index(end)+len(end)] 'abc'

You can use regular expressions:

 >>> import re >>> re.search('{}.*?{}'.format(*map(re.escape, [start, end])), text, re.M).group() 'abc'

In order not to load the entire file into memory, you can use mmap and bytes. Cm.

i = text.index(start); result = text[i:text.index(end, i) + len(end)]
@andreymal if you’ve got a few lines, you can immediately use regex for correctness (as in the sed '/start_pattern/,/end_pattern/ or perl flip-flop operator).
Many variations are possible depending on the requirements: allowed input, performance, first entry / most recent, and so on.
You can for example text.find () use and check the returned index instead of throwing an exception.
And why *map(re.escape, [start, end]) , and not re.escape(start), re.escape(end) ?
Well, by the way, in Reading a file segment in Python, the task is to search for all occurrences, which is not completely solved with re.search().group() .
@NickVolynkin 1- map (re.escape, hints that one transformation (that strings should literally be taken, and not as regular expressions). Often you have to map (re.escape, use for a list of two variables can also be called separately. 2 - I see "reading a segment" (and not reading a segment) in the title. The author can edit the question that interests a few occurrences or I should point to a quote from a question that I did not carefully read, then I will re- open ( mmap + re.findall is possible)

Answer 2 · 2017-02-02T10:24:33

 # если имеются виду строки разделенные '\n', можно не читать файл целиком def get_inside_lines(file: iter, start_line: str, end_line: str) -> iter: for line in file: if line == start_line: yield line for line in file: yield line if line == end_line: return r = ''.join(get_inside_lines(open('111.txt', encoding='utf-8'), 'Но пусть она вас больше не тревожит;\n', 'То робостью, то ревностью томим;\n'))

If you search by line, you can: itertools.takewhile(end.__ne__, dropwhile(start.__ne__, file)) that returns start + all lines to end (not including).
read the question carefully: "take them and what is between them"
so my comment clearly contains: "until end (not including)"

How to select text from a file between two known lines?

2 answers 2

More articles: