The task: it is necessary to discard from the beginning of this line all characters that are not letters, to the first encountered letter. Exclusively using a regular expression.

For example: s = "12. Hello world 23 times"

Should stay "Hello world 23 times"

I tried something like this

re.match(r'(?<=(\d|\.|\s)*)(.*)', s).group() 

does not work

    1 answer 1

    The Python re regular expression module does not support backward preview blocks of indefinite length. In (?<=(\d|\.|\s)*) length of the match is undefined, since the quantifier * finds zero or more (unknown how many) characters.

    Use

     re.sub(r'^[^а-яА-ЯёЁ]+', '', s) 

    Demo

    Details:

    • ^ - beginning of line
    • [^а-яА-ЯёЁ]+ - 1 or more characters other than Russian letters. If necessary, you can turn on the Latin - [^а-яА-ЯёЁa-zA-Z]+ .

    Note :

    The symbol ^ in different contexts can mean:

    • The literal symbol ^ (caret) :

      • If it is escaped by the \ character (anywhere in the regular expression)
      • If it is inside a character class not in the initial position (i.e. [;!^] )
      • (In some other languages ​​/ libraries there may still be between \Q ... \E )
    • Metacharacter of the beginning of the string / whole text (depending on the modifier or the library of regular expressions, in Python, when using re.M / re.MULTILINE , the position after the LF, \n if the modifier is not specified, only the beginning of the whole text will be found)

      • If it is not escaped and is not inside a character class
    • A metacharacter that inverts a character class , i.e. [^.] will find all characters other than a dot.

    ATTENTION ! [^^] finds any character other than ^

    • And why not re.sub("^[\d\s.]*","",s); it is easier and exactly answers the character set specified by the author of the question - Mike
    • @Mike: I do not look at the expression of the vehicle, but at the description of the task - you need to drop all non-letters from the beginning of this line until the first letter that is encountered . Something that is not something is caught using inverted character classes like [^....] - Wiktor Stribiżew
    • In the first case, the symbol "^" is used as the beginning of a line, and in the second as a negation? If so, what rule exists for the meaning of this character, depending on its position? - AND
    • I added a description of the symbol ^ . - Wiktor Stribiżew