Breaking a string into groups in regular expressions

Question

I want to check the line for the presence of randomly exposed letters of different registers. For example, the input line is:

This is a line ... this is TroCa.

Output line:

This is a string ... This is a string.

I install the first letter of the sentence like this:

private static final String DOT_REGEX = "\\s*(?<!\\.)\\.(?!\\.)\\s*"; private static final String MULTI_DOT_REGEX = "\\s*\\.{3}\\s*"; private static final String FORMAT_CASE = "(?:^| )^\\w" + "|" + MULTI_DOT_REGEX + "\\w" + "|" + DOT_REGEX + "\\w";

The function that processes the string and returns the finished result:

 private static String getFormatCaseString(String targetString){ Matcher matcher = Pattern.compile(FORMAT_CASE).matcher(targetString); StringBuffer stringBuffer = new StringBuffer(); while (matcher.find()){ matcher.appendReplacement(stringBuffer, matcher.group().toUpperCase()); } matcher.appendTail(stringBuffer); return stringBuffer.toString(); }

The idea is to split the sentence into two groups: the first is the first letters of the new sentence (which are set in UpperCase ) and the second is all the other letters that are in lower case. How to do it?

@ WiktorStribiżew, thank you once again. Could you once again explain this expression in the answer. I am given regular expressions. - UjinUkr

Wiktor Stribiżew Wiktor Stribiżew 11.9k 2 13 32 · Accepted Answer · 2018-12-27T13:02:03

You can capture the first letter (word type symbol) into the first exciting group and all the text up to the first single point or the three dots into another group, and then apply the existing logic to replace:

 String targetString = "Это стрОка... это сТроКа."; Matcher matcher = Pattern.compile("(?Us)(\\w)(.*?(?:\\.{3}|(?<!\\.)\\.(?!\\.)))").matcher(targetString); StringBuffer stringBuffer = new StringBuffer(); while (matcher.find()){ matcher.appendReplacement(stringBuffer, matcher.group(1).toUpperCase() + matcher.group(2).toLowerCase() ); } matcher.appendTail(stringBuffer); System.out.println(stringBuffer.toString()); // = Это строка... Это строка.

Details

(?Us) - modifiers Pattern.UNICODE_CHARACTER_CLASS (for \w find both Russian letters) and Pattern.DOTALL (for . find newline characters)
(\\w) - Exciting group number 1: letter, digit or _ (this value will be matcher.group(1).toUpperCase() - matcher.group(1).toUpperCase() )
(.*?(?:\\.{3}|(?<!\\.)\\.(?!\\.))) - Exciting group number 2 (this value will be reduced to lower case - matcher.group(2).toLowerCase() ):
- .*? - 0 or more characters, as little as possible
- (?: - the beginning of a non-seizing group
  - \\.{3} - three points
  - | - or
  - (?<!\\.)\\.(?!\\.) - the point in front of and behind which there are no other points
- ) - end of non-capturing group

Breaking a string into groups in regular expressions

1 answer 1

More articles: