Separation of text from headings is required, by inserting a blank line after the last. The text has the following structure:

Заголовок заголовок заголовок Слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово. 

Signs of the title: begins with a capital letter, does not end with a punctuation mark, has a length of up to 30 characters, the subsequent line also begins with a capital letter.

Accordingly, it is necessary after the line with the title to add a second carriage return. At the moment in the python script there is the following regular expression, which, however, finding the headers, replaces them with an empty string:

 текст = re.sub("^[А-Я]{1}.{,30}\n", "\n", текст) 

ADDITIVE

¡Thanks to @ReinRaus for this proposed solution:

 текст = re.sub("^([А-Я]{1}.{,30}\n)", r"\1\n", текст) 
  • apparently because it does not add to the found empty string, but simply replaces it. - spirit
  • Exactly. The question is, what should the "\ n" replace, what did the found text remain? I tried the text type options = re.sub ("^ [AZ) {1}. {, 30} \ n", "[AZ] \ n", text), but without success. - Rossarh
  • Just replace not \ n and $ 0 \ n - ReinRaus
  • Sorry if I misunderstood, but I mean so? > text = re.sub ("^ [AZ] {1}. {, 30} \ n", "$ 0 \ n", text) This replaces the found text with $ 0. - Rossarh
  • Sorry, in python you need to write like this: "\\ 0 \ n" is the string to replace. - ReinRaus pm

2 answers 2

 import re text= '''Заголовок заголовок заголовок Слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово.''' print ( re.sub("^([А-Я]{1}.{,30}\n)", r"\1\n", text) ) 

Result:

 Заголовок заголовок заголовок Слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово. 

    Maybe so:

     p = re.compile("^[А-Я]{1}.{,60}\n", re.MULTILINE) s1 = p.findall(f.read())[0] p.sub(s1+'\n', s) 

    This is how it works with English (if I correctly understood what is required). With Latin in Unicode, be careful, there may be a number of characters * 2 required.

    It should work with \ 1 as Here , but it does not work for me, so I saved what I found and replaced it.

    • It is advisable to use re.sub. It is also impossible to declare text from the script itself, since the script reads third-party files. - Rossarh
    • corrected, it works for me. - spirit
    • How to add this code to the existing one? text = re.compile ("^ [AZ] {1}. {, 30} \ n", re.MULTILINE) s1 = text.findall (f.read ()) [0] text.sub (s1 + ' \ n ', s) gives ValueError: Mixing iteration methods - Rossarh
    • f.read () - what comes from the file is text. And leave the variable p in place) - spirit
    • Sorry, I don’t understand where to put it, since I’m just learning, and I’m also correcting someone else’s script. Is it possible to solve the issue through re.sub, as Comrade tries above? ReinRaus? - Rossarh