It is necessary to extract from the string, using regular expressions, all numbers (ie, 1 2 3, but not 1.1 or 14.14 - this is double) and all single letter characters (that is, a b in abc).

How to do this better, "\ d +" outputs a double, simply skipping a period, i.e. 14.14 it recognizes as 14 and 14. It does the same with the characters "\ w {1,1}", displays all characters, dividing the lines on individual characters.

  • one
    with simpler letters, you can wrap them in \b - teran
  • 2
    Try Pattern p = Pattern.compile("(?U)(?<![0-9]\\.{0,1})[0-9]+(?!\\.?[0-9])|\b\\p{L}\\b"); - Wiktor Stribiżew

1 answer 1

you can use

 (?U)(?<![0-9]\.{0,1})[0-9]+(?!\.?[0-9])|\b\p{L}\b 

Java:

 String regex = "(?U)(?<![0-9]\\.{0,1})[0-9]+(?!\\.?[0-9])|\\b\\p{L}\\b"; 

See the demo (so this regular will work in Java)

  • (?U) - Pattern.UNICODE_CHARACTER_CLASS , the word boundary will begin to recognize the boundaries of words in Unicode
  • (?<![0-9]\\.{0,1}) - before the next digit there should not be a digit or digits and points
  • [0-9]+ - 1 or more digits
  • (?!\\.?[0-9]) - immediately after the digit found with [0-9]+ there should be no other digit or point and digit
  • | - or
  • \\b - word boundary
  • \\p{L} - the letter
  • \\b - word boundary

Differently, without an optional sign . In the preview block back, you can write this template like this:

 (?<![0-9]\.)(?<![0-9])[0-9]+(?!\.?[0-9])|\b\p{L}\b 

Demo . Or like this:

 (?<![0-9]\.|[0-9])[0-9]+(?!\.?[0-9])|\b\p{L}\b 

Another demo .