Extract a line from the text, starting with the specified word and to a new line \ n

Question

The task is to extract a line starting with a certain word and ending with a newline character (\ n).

Example:

data_list =["text\n text.....\nKeywords: key, key, key\n text......\n"]

You need to extract the line:

 'Keywords: key, key, key'

but it is not known where in the text it is

Accepted Answer · 2016-11-29T11:58:38

Pull out a regular expression, for example:

 text = "text\n text.....\nKeywords: key, key, key\n text......\n" import re match = re.search('(Keywords: .+?)\n', text) if match: print(match.group(1)) # "Keywords: key, key, key"

The brackets indicate the area you want to pull out. And the lines themselves inside the brackets can be obtained through group . If the search function is not found by the search function template, None will return, so it is advisable to break through the value.

In the regular schedule, we indicate that we are interested in a string starting with Keywords: after it, any characters before the \n character follow.

I will add the answer from the clarification in the comment:

 import re data_list = ["text\n text.....\nKeywords: key, key, key\n key key key\n 1 text......\n"] for text in data_list: match = re.search('(Keywords: .+)\n', text, flags=re.DOTALL) if match: # b'Keywords: key, key, key\n key key key\n 1 text......' print(match.group(1).encode())

The result is output as a byte array so that you can see that the characters \n in place.

Without the re.DOTALL flag, it is easier to use a regular expression
And if the line looks like: data_list = ["text \ n text ..... \ nKeywords: key, key, key \ n key key \ n 1 text ...... \ n"] that is, the line with Keywords has more than one special character \ n
Do you need to collect the end of the string after the Keywords?
@MaxIvanov If you want to extract all the contents into text, starting with word and to the very end, then the code is even simpler: result = text[text.find(word):]

Answer 2 · 2016-11-30T01:51:25

To extract everything from the word: 'Keywords' to the end of the line (before the line break):

 import re word = "Keywords" result = re.search(re.escape(word) + '.*', text).group() # -> 'Keywords: key, key, key'

The default point (without re.DOTALL ) does not coincide with \n , so you can use a simple regular expression if you extract text before the first new line ( '\n' ). Otherwise, you can flags=re.DOTALL to get all the content to the end of the text.

re.escape() used in case the word can contain characters that are special in regular expressions ( *+. etc).

You can do without regular expressions (less convenient):

 assert word in text and '\n' not in word text += '\n' # make sure there is a newline i = text.find(word) result = text[i:text.find('\n', i)]

The same result.

If you want to extract all the content in the text, starting with word and to the very end:

 assert word in text result = text[text.find(word):]

Extract a line from the text, starting with the specified word and to a new line \ n

2 answers 2

More articles: