It can be limited to the fact that the sentence ends:

"lower case" - ". or! or?" - "space" - "capital letter"

For example:

"Hello! I am a simple text. Can you share me?"

[“Hello!”, “I am simple text.”, “Can you share me?”]

There was an attempt, but unsuccessful:

re.split(r'\w[.!?]+\s+[А-Я]', "Hello! I'm John. Are you OK? fine... and so") 

    2 answers 2

    We divide by space, but use a positive look back to make sure that there is a letter and (dot or! Or?) In front of the space:

     import re result = re.split(r'(?<=\w[.!?]) ', "Hello! I'm John. Are you OK? fine... and so") print (result) result = re.split(r'(?<=\w[.!?]) ', "Привет! Я простой текст. Ты сможешь разделить меня?") print (result) 

    Result:

      ['Hello!', "I'm John.", 'Are you OK?', 'fine... and so'] ['Привет!', 'Я простой текст.', 'Ты сможешь разделить меня?'] 

    PS On Unicode did not check. Tested on https://repl.it/languages/python3

    UPD \w may be worth replacing with the enumeration of valid characters, since these can be letters, numbers, and underscores .

    • It works, but for the sake of interest, is it possible to somehow break without losing a character (the space disappears)? - Evgeny Kuzmin
    • @ YevgenyKuzmin, the python does not want to split into an empty pattern (without capturing at least 1 character), returns ValueError: split() requires a non-empty pattern match. - Visman
     (.+?[.!?]) - разбивает по . ! ? 
    • re.split(r'(.+?[.!?])', 'dfg! Dgfg? ddf. Dfdg. fdgdfg') <br/> returns with empty elements: ['', 'dfg!', ' ',' Dgfg? ',' ',' Ddf. ',' ',' Dfdg. ',' Fdgdfg '] - Yevgeny Kuzmin