A formal description of a text in a natural language is very complex, especially if you take into account all the peculiarities of the language, such as abbreviations, the many-valued use of punctuation marks, complicated sentences, etc. not counting errors and typos. Therefore, without control by the operator, processing arbitrary text is almost impossible task.
@Bulson suggested a bottom-up hierarchy, so I’ll try to offer a top-down option, the truth is somewhere in the middle.
The text in the simple case is an array of sentences , the order is determined by the order of placement in the array.
The sentence consists of lexemes - minimal independent meaningful units of text.
Tokens can be divided into 3 classes: a separator (space), a punctuation mark , a word (a word from one letter is also a word, the number written in numbers is also a word). Each token is one or more characters from the group specified for a given type of tokens.
The symbol is the minimum unit of text, a building brick for tokens, in principle, you can use the base Char for this.
And the most important thing, in my opinion, is to write down formal rules for the text, for example: a sentence cannot begin and end with a separator, at the end of a sentence there should be a punctuation mark, a word cannot begin with a symbol denoting a punctuation mark, but may contain it in the middle or end (abbreviations ie, because, initials and the like), well, etc. etc.
And there will be no hierarchy of classes here, there will be composition, instead of inheritance. Although in some cases the use of inheritance is possible, for example for tokens.