Regular expression: Russian words have no “\ b” border?

Question

Explain, please, why the word boundary \b does not work for Russian words, at a time when it works for English.

 var str = "one один two два three" var work = str.match(/([\w])+\b|([а-я])+/giu); console.log(work, ":Без \\b для русских слов:"); //вот тут var notWork = str.match(/([\w])+\b|([а-я])+\b/giu); console.log( notWork, ":c \\b для русских слов:");

They do not know there that there are letters other than Latin.
Exactly in this way, he considers “the word” wherever there is a Latin in any case, numbers, an underscore.
Read more in specs around here bterlson.imtqy.com/ecma262/…

Accepted Answer · 2018-01-15T09:04:36

Rules for the operation of regular expressions are described in the specification.

When checking \b , the following steps are taken:

Executes the IsWordChar function for the current character and the previous one.
If the values obtained differ, it returns true
false otherwise.

What is the IsWordChar function?

This function is a simple check for the occurrence of a character in a predefined list of characters. The list of characters includes the following 63 (26 * 2 + 10 + 1) characters fixed in the specification:

 abcdefghijklmnopqrstu vwxyz ABCDEFGHIJKLMNOPQRSTU VWXYZ 0 1 2 3 4 5 6 7 8 9 _

As you can see, there are no all characters except English, numbers and underscores.

Regular expression: Russian words have no “\ b” border?

1 answer 1

More articles: