What characters does a regular expression include [а-яА-Я]+ ? It is known that it does not include the symbols ё and ъ . And in order to cover the entire alphabet of the Russian language you need to write [а-яА-ЯёЁъЪ]+ ? Or are there any other characters of the alphabet of the Russian language, which this expression does not include? How to find out the full range of characters included in the regular expression?
1 answer
Java regular expressions support Unicode® Technical Standard # 18 UNICODE REGULAR EXPRESSIONS . Accordingly, character ranges are counted according to Unicode code positions.
The letters of the Russian alphabet in Unicode occupy positions from 0410 to 044F , except for the letters “ё” and “Ё”, which for historical reasons are placed in positions that do not correspond to the Russian alphabet ( find them in the symbol table ). This makes it necessary to specify them separately.
There are some other characters of the alphabet of the Russian language, which this expression does not include
No, even for a solid sign, no need to make exceptions.
How to find out the full range of characters included in the regular expression?
By code positions. In Java, the numeric value of a character (char) corresponds to its position in Unicode. Accordingly, you can set char variables for the range limits and cycle through all characters. For example, the following code displays all characters in the range from “A” to “Z” and their position in Unicode.
for(char ch = 'А'; ch<='Я'; ch++) { System.out.println(ch +" ("+ ((int) ch)+")"); }
[а-яА-Я]from Russian letters does not include onlyёandЁ- Wiktor Stribiżew