This question has already been answered:

How to break the text into separate sentences? The splitlines () variant is not appropriate, since the text can be written in one line.

Reported as a duplicate by Oceinic , Qwertiy , Streletz , Alex , Suvitruf Nov 21 '15 at 21:59 .

A similar question was asked earlier and an answer has already been received. If the answers provided are not exhaustive, please ask a new question .

    3 answers 3

    Expression ignores
    1980
    100rub.
    100r.
    100kop
    100k
    etc.
    etc.
    as well as combined punctuation marks.
    Code here
    http://ideone.com/pNpffv

    • one
      @ReinRaus: It seems to me that it is better to implement such semantics: there should be a space after the punctuation mark / s, the next word should be with a capital letter. Year and the price can quite finish the sentence ... - VladD
    • Uh, so do we have every new sentence begins with a capital letter? Why not look for the place where the dot stands and the capital letter goes? Of course, there are exceptions, like names, cities, etc. - but this is already pouring into the whole project. Ugh, duplicate @VladD Well, don't care. - lampa 9:46 pm
    • UPD Fixed yesterday's errors, it turns out the problem was not feeling bad, but in this problem - ReinRaus
    • one
      Interesting, and the enclosed offers can be? I will clarify that there is a sentence, a quote begins in it (in quotes. Or brackets, like this), and then it continues. How to disassemble such garbage? In theory, this is a question on the Russian Language forum, but it’s not to be registered there (about the problem of little significance to me). - avp
    parts = all_text.split('.') 
    • Then at least re.compile ("[.!? \ N]"). Split (all_text) re.split ("[.!? \ N]", all_text) - alexlz
    • How to deal with the triple point? - moden
    • 2
      If you give a hungry fish - he will eat only once, and if you teach how to fish - he will always be full ... - qnub
    • one
      So it should be better with composite delimiters: re.split ("\\ b [.!? \\ n] + (? = \\ s)", all_text) - ReinRaus
    • one
      @moden ... is called ellipsis (at least it was called when I was in school). Secondly, there is such a sign ?!? . You can approach the task "creatively" (delete empty lines). filter (lambda x: not re.match ("^ \ s * $", x), re.split ("[!.? \ n]", all_text) However, there is a suspicion that the punctuation marks should be present in the resulting list. And then - just search by pattern [^!.?\n]+[!.?\n]+ . That also does not give 100% correct result "In 1998 there was a default." - alexlz
     s = "Properties are a little different. They need a special declaration since they're handled in a very different way. (Hmmmm... I may have figured out an obvious way around that, but I want to get this out the door first.) Here's how you'd mock out calls to a property. Note that unlike other calls, all the calls to an overridden property must be played back in order." def srtip_sent(str_): separators = ['.', '?', '!'] start = 0 s_split = [] for i in range(len(str_)): if s[i] in separators: s_split.append(str_[start:i+1]) start = i + 1 return map(lambda s: s.strip(), s_split) srtip_sent(s) ['Properties are a little different.', "They need a special declaration since they're handled in a very different way.", '(Hmmmm.', '.', '.', 'I may have figured out an obvious way around that, but I want to get this out the door first.', ") Here's how you'd mock out calls to a property.", 'Note that unlike other calls, all the calls to an overridden property must be played back in order.'] 

    Does not work correctly with compound characters, for example, with a triple-point.

    • one
      ... and with initials, such as А. С. Пушкин . And with internal punctuation marks like "Что за хрень?" -- поинтересовалась Алиса. "Что за хрень?" -- поинтересовалась Алиса. - VladD