Explain, please, why the word boundary \b does not work for Russian words, at a time when it works for English.

 var str = "one один two два three" var work = str.match(/([\w])+\b|([а-я])+/giu); console.log(work, ":Без \\b для русских слов:"); //вот тут var notWork = str.match(/([\w])+\b|([а-я])+\b/giu); console.log( notWork, ":c \\b для русских слов:"); 

  • four
    They do not know there that there are letters other than Latin. - Visman
  • 2
    Exactly in this way, he considers “the word” wherever there is a Latin in any case, numbers, an underscore. Read more in specs around here bterlson.imtqy.com/ecma262/… - Duck Learns to Take Cover
  • @ Dmytryk u modifier to help you. - Edward
  • @ Edward, the u flag does not help. Updated the question and added it to the code. - Dmytryk

1 answer 1

Rules for the operation of regular expressions are described in the specification.

When checking \b , the following steps are taken:

  1. Executes the IsWordChar function for the current character and the previous one.
  2. If the values ​​obtained differ, it returns true
  3. false otherwise.

What is the IsWordChar function?

This function is a simple check for the occurrence of a character in a predefined list of characters. The list of characters includes the following 63 (26 * 2 + 10 + 1) characters fixed in the specification:

 abcdefghijklmnopqrstu vwxyz ABCDEFGHIJKLMNOPQRSTU VWXYZ 0 1 2 3 4 5 6 7 8 9 _ 

As you can see, there are no all characters except English, numbers and underscores.

  • Thank you, exhaustive answer) - Dmytryk