I process a line with a construct:

this.value = this.value.replace(/[^\w]/ig, '');

I noticed that Cyrillic characters are not perceived as alphabetic in JavaScript, which differs from the behavior of \w , for example, for preg_replace in php .

UTF-8 page encoding.

Solution А-я works, but its flaw is (presumably) the need for each alphabet other than Latin to prescribe the appropriate range.

I ask you to suggest whether there is a possibility in js to trace letter characters in Cyrillic without the structure of the form А-я ? Is there a universal solution?

  • \ w - this thing has problems with Cyrillic, it works well with Latin. - Ihor Bondartcov
  • @IhorBondartcov does not have any problems with the Cyrillic alphabet, because it corresponds to [a-zA-Z0-9_] - teran
  • @teran You wrote it right - it corresponds to Latin and numbers but not Cyrians - Ihor Bondartcov

1 answer 1

Use the XRegExp library , which supports Unicode categories in regular expressions. To find all the characters that are letters of any alphabet, use \pL / \p{L} , and to find all other characters, use \PL / \p{^L} .

 var str = "Пора поговорить с ним tête-à-tête."; var regex = new XRegExp('\\PL'); // определение регулярного выражения //var regex = new XRegExp('[^\\pL\\s]'); // => Пора поговорить с ним têteàtête (если надо оставить пробелы или добавить исключения) var result = XRegExp.replace(str, regex, '', 'all'); // 'all' - заменить все вхождения console.log(result); 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>