I am engaged in parsing texts of court decisions. There is such a piece of text from which you need to pull information about the punishment (imprisonment):в виде лишения свободы на срок 3 года 10 месяцев со штрафом в размере 150 000 рублей с ограничением свободы на срок на 8 месяцев.
Composed by regexлишени[а-я]+\s*?свободы\s*?на\s*?(?:срок)?\s*?(?:(?P<years>\d+).*?(?:года?|лет)?)?\s*?и?\s*?(?:(?P<months>\d+)\s*?(?:месяц[а-я]{0,3}))?
gives the result of the лишения свободы на , but if you remove the final question mark (which cannot be removed in the general case), you get the desired result:лишения свободы на срок 3 года 10 месяцев .
The documentation says:
The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible.
Question: why in my case addition ? gives the opposite (not the one that I expected) effect?
.*?? Why not\s*? In general, I have so far only succeeded regex101.com/r/4ls28x/2 . - Wiktor Stribiżew.*?because of such cases:...лишения свободы на 5 (пять) лет...- Roman Yakubovich.*?- why in this case it matches zero characters, and not before the first continuation of the pattern (what exactly is “non-greedy”)? Because of standing afterлет?he doesn't even look, is there a possible continuation in the line? Because it works - regex101.com/r/4ls28x/4 - Roman Yakubovich.*?There must be at least one required template. In the original expression, they are all optional, since after each of them there is a quantifier?. Those.годorлетshould be exactly, yes? - Wiktor Stribiżew