There is a text file:

не нужная текстовая строчка Иванов 15.1 10е9/л (4.0 - 1 о.о) >1 Петров 11.5 10е9/л (о - 5) Сидоров 0.8 10е9/л (о - 2) Мельников 2 10е9/л (1 - 8) 

I need to extract the Surnames and their numeric values ​​after the space. So that the output is such a result:

 Иванов: 15.1 Петров: 11.5 Сидоров: 0.8 Мельников: 2 

I already have a pattern with a regular expression, but it extracts the values, not as I need.

 matches_list = re.findall(r'([\w\s]+).+?(\d*\.\d+|\d+)', content, flags=re.DOTALL | re.U) 

Please help me to make a regular expression.

  • one
    One possible solution to the problem: list(map(lambda x: ': '.join(x), re.findall(r'(\S+)(?:\s+)?\n(\d+(?:\.\d+)?)', content))) , where content is the content of the file. But it is rather a head-on solution. It is probably better to use re.sub to scatter the captured groups on the specified pattern " name: number ". - greg zakharov
  • (\w+)\s*\n([\d\.]+) - Let's say Pie

1 answer 1

 print('\n'.join(f'{x[0]}: {x[1]}' for x in re.findall(r'[\r\n]+([\D]+?)\s*?[\r\n](\d*\.\d+|\d+)', content, flags=re.M|re.S))) 

 Иванов: 15.1 Петров: 11.5 Сидоров: 0.8 Мельников: 2 

UPDATE:

 In [7]: %paste content="""не нужная текстовая строчка не нужная текстовая строчка не нужная текстовая строчка Иванов 15.1 10е9/л (4.0 - 1 о.о) >1 Петров 11.5 10е9/л (о - 5) Сидоров 0.8 10е9/л (о - 2) Мельников 2 10е9/л (1 - 8)""" print('\n'.join(f'{x[0]}: {x[1]}' for x in re.findall(r'[\r\n]([^\d\r\n]+?)\s*?[\r\n](\d*\.\d+|\d+)', content))) ## -- End pasted text -- Иванов: 15.1 Петров: 11.5 Сидоров: 0.8 Мельников: 2 
  • The question is, what does In [55]:? Python gives an error with this value, and without it, everything works fine - Danila00000
  • @ Danila00000, this is the "prompt" interactive Python ( ipython ) - MaxU
  • Tell me, please, how should I modify a regular expression so that it removes a bunch of lines that I don’t need? For example: "not needed text line not needed text line not needed text line Ivanov 15.1 10е9 / l (4.0 - 1 о.о)> 1 Petrov 11.5 10е9 / l (о - 5) not needed text line Sidorov 0.8 10e9 / l ( o - 2) Melnikov 2 10e9 / l (1 - 8) "PS Regular expressions are my weakness ( - Danila00000
  • @ Danila00000, added a corrected version in response - MaxU
  • one
    thank you so much !!!!)))) Straight rescued a lot) - Danila00000