Insert a blank line after the words selected by the regular expression

Question

Separation of text from headings is required, by inserting a blank line after the last. The text has the following structure:

Заголовок заголовок заголовок Слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово.

Signs of the title: begins with a capital letter, does not end with a punctuation mark, has a length of up to 30 characters, the subsequent line also begins with a capital letter.

Accordingly, it is necessary after the line with the title to add a second carriage return. At the moment in the python script there is the following regular expression, which, however, finding the headers, replaces them with an empty string:

 текст = re.sub("^[А-Я]{1}.{,30}\n", "\n", текст)

ADDITIVE

¡Thanks to @ReinRaus for this proposed solution:

 текст = re.sub("^([А-Я]{1}.{,30}\n)", r"\1\n", текст)

apparently because it does not add to the found empty string, but simply replaces it.
The question is, what should the "\ n" replace, what did the found text remain?
I tried the text type options = re.sub ("^ [AZ) {1}. {, 30} \ n", "[AZ] \ n", text), but without success.
> text = re.sub ("^ [AZ] {1}. {, 30} \ n", "$ 0 \ n", text) This replaces the found text with $ 0.
Sorry, in python you need to write like this: "\\ 0 \ n" is the string to replace.

ReinRaus ReinRaus 16k 3 gold marks 32 silver marks 77 bronze marks · Accepted Answer · 2013-10-17T19:38:18

 import re text= '''Заголовок заголовок заголовок Слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово.''' print ( re.sub("^([А-Я]{1}.{,30}\n)", r"\1\n", text) )

Result:

 Заголовок заголовок заголовок Слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово слово.

Community spirit ♦ one · Answer 2 · 2013-10-17T16:34:35

Maybe so:

 p = re.compile("^[А-Я]{1}.{,60}\n", re.MULTILINE) s1 = p.findall(f.read())[0] p.sub(s1+'\n', s)

This is how it works with English (if I correctly understood what is required). With Latin in Unicode, be careful, there may be a number of characters * 2 required.

It should work with \ 1 as Here , but it does not work for me, so I saved what I found and replaced it.

It is also impossible to declare text from the script itself, since the script reads third-party files.
text = re.compile ("^ [AZ] {1}. {, 30} \ n", re.MULTILINE) s1 = text.findall (f.read ()) [0] text.sub (s1 + ' \ n ', s) gives ValueError: Mixing iteration methods
Sorry, I don’t understand where to put it, since I’m just learning, and I’m also correcting someone else’s script.
Is it possible to solve the issue through re.sub, as Comrade tries above?

Insert a blank line after the words selected by the regular expression

2 answers 2

More articles: