How to process a cyrillic alphanumeric character with a regular expression?

Question

I process a line with a construct:

this.value = this.value.replace(/[^\w]/ig, '');

I noticed that Cyrillic characters are not perceived as alphabetic in JavaScript, which differs from the behavior of \w , for example, for preg_replace in php .

UTF-8 page encoding.

Solution А-я works, but its flaw is (presumably) the need for each alphabet other than Latin to prescribe the appropriate range.

I ask you to suggest whether there is a possibility in js to trace letter characters in Cyrillic without the structure of the form А-я ? Is there a universal solution?

\ w - this thing has problems with Cyrillic, it works well with Latin.
@IhorBondartcov does not have any problems with the Cyrillic alphabet, because it corresponds to [a-zA-Z0-9_]
@teran You wrote it right - it corresponds to Latin and numbers but not Cyrians

Accepted Answer · 2017-12-28T11:27:59

Use the XRegExp library , which supports Unicode categories in regular expressions. To find all the characters that are letters of any alphabet, use \pL / \p{L} , and to find all other characters, use \PL / \p{^L} .

 var str = "Пора поговорить с ним tête-à-tête."; var regex = new XRegExp('\\PL'); // определение регулярного выражения //var regex = new XRegExp('[^\\pL\\s]'); // => Пора поговорить с ним têteàtête (если надо оставить пробелы или добавить исключения) var result = XRegExp.replace(str, regex, '', 'all'); // 'all' - заменить все вхождения console.log(result);

 <script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>

How to process a cyrillic alphanumeric character with a regular expression?

1 answer 1

More articles: