It is necessary to find all the words from the text (but at the end there may be periods or commas). It is necessary to correctly calculate the number of letters in each word. Help to make a regular book that will look for words without a sign and with a sign at the end.

Du HAST MICH GEFRAGT Du HAST MICH GEFRAGT und ich hab NICHTS GESAGT WILLST du bis der Tod EUCH SCHEIDET TREU ihr SEIN fur ALLE Tage 啸袨袪袨楔袨 袞袠袙衼孝 薪邪 小袙袝孝袝 袙袠袧袧袠 袩褍褏, 袨褌 孝袨袚袨 袩袨衼孝 芯薪 褝褌懈 袩袝小袧袠 袙小袥校啸. 
  • 3
    \b - word boundary - teran
  • one
    It may be suitable to first remove all punctuation from a string, and then convert the string to an array by a delimiter (space). I do not rummage in Java, but in Sharp it is split . - Vitaly Shebanits September

2 answers 2

 String s = "Du HAST MICH GEFRAGT Du HAST MICH GEFRAGT und ich hab NICHTS GESAGT WILLST du bis der Tod EUCH SCHEIDET TREU ihr SEIN fur ALLE Tage 啸袨袪袨楔袨 袞袠袙衼孝 薪邪 小袙袝孝袝 袙袠袧袧袠 袩褍褏, 袨褌 孝袨袚袨 袩袨衼孝 芯薪 褝褌懈 袩袝小袧袠 袙小袥校啸."; String[] words = s.split("[^\\p{IsAlphabetic}]+"); for (String w : words) { System.out.println(w); } 

But your question and this answer do not take into account the words containing a hyphen. For example: "somehow", "in Russian", etc. This code will split them into two words.

    javadoc POSIX character classes (US-ASCII only)

    \ p {Punct} Punctuation: One of! "# $% & '() * +, -. / :; <=>? @ [] ^ _` {|} ~

    \ p {Space} A whitespace character: [\ t \ n \ x0B \ f \ r]

    As advised above, you can use the String.replace () method, replace punctuation with spaces, and then walk String.split () by spaces.

    Or use the Pattern and Matcher Pattern.compile ("[a-zA-Z] +"), if it coincides with the same replay, replace what you need.

    • And as the teachers say, do not forget about the multiline (? M), I really have not done this desk myself - M Oleg