There is a string

Country of origin: Slovakia. The HT 17/43 safe can be installed and secured in furniture, between the shelves, as only this model has non-standard dimensions for safes (the depth and width of the safe is 2-2.5 times the height). This aspect ratio is most convenient for installation in furniture. In this case, the space reserved for the safe is used to the maximum. The safe is equipped with emergency access keys (Mauer – Germany, Borg-England), which excludes blocking of the electronic lock in case of code loss.

I did it first

var lst = str.Split('.').ToList(); 

Everything is correct in principle, but when there are words т.к 2.5 times, then the sentence is semantically incorrectly divided into lines. How to be in such cases? How to semantically correctly divide lines into sentences?

3 answers 3

Use the rules of the Russian language? :) The basic list is:

  1. Offer ends on . , ? ! .
  2. The sign is followed by a space or line break.
  3. After the space / line break, the capital letter follows.

Depending on the text (for example, if there are lists or direct speech), this list may be added. I propose to first use the three rules, and then, if necessary, add to the list. He will give a better result, however, the initials ("A.S. Pushkin") will already begin to mess up.

In general terms, of course, the task is complex and the “on-the-knee” algorithm is not enough. For example, if the text is complex: Как писал А. С. Пушкин: «Я помню чудное мгновенье. Ты съела всё моё варенье!» — своей няне в Шушенское из Ясной Поляны... Как писал А. С. Пушкин: «Я помню чудное мгновенье. Ты съела всё моё варенье!» — своей няне в Шушенское из Ясной Поляны... (thanks @VladD for an example).

  • one
    With our "Great and mighty" is generally difficult. The structure of sentences is practically not amenable to formalization and for most, if not all, the rules can be chosen as a counterexample from literary sources. And this does not take into account, on average, the low literacy of most texts (especially in advertising infuriates, not only is advertising, but also with errors), where you can find not only the lack of proper punctuation, but anything at all =) - rdorn

You can try using this parser:

https://tech.yandex.ru/tomita/

But you have to train him.

     var lst = str.Split('. ').ToList(); 

    After the proposals, it is customary to put a space.

    • Как насчет такого текста? Он ведь не разобьется. - andreycha
    • one
      Как писал А. С. Пушкин: «Я помню чудное мгновенье. Ты съела всё моё варенье!» — своей няне в Шушенское из Ясной Поляны... Как писал А. С. Пушкин: «Я помню чудное мгновенье. Ты съела всё моё варенье!» — своей няне в Шушенское из Ясной Поляны... - VladD
    • var lst = str.Split (new Char [] {'.', '?', '!'}); - Tomas
    • @VladD well good to shock :). By the way, I don’t remember whether this passage is considered one sentence or not: R. - andreycha
    • @Tomas, Do not take off. - s8am