Javascript and long regular

Question

Good day! I have a regular schedule designed to search for forms of the same word in the string. These regulars are automatically generated for each word and are rather long, but one pattern is created. For example, you need to create a regular calendar to search for the form of the word "cat". Take the "basic" part of the regular season:

(^|[^0-9a-zа-я])(кот)([^0-9a-zа-я]|$)

and create on its basis a full regular schedule for all forms, gluing together from separate parts for each form:

 ((^|[^0-9a-zа-я])(кот)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(котов)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(котам)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(котами)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(котах)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(коту)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(котом)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(коте)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(кота)([^0-9a-zа-я]|$)|(^|[^0-9a-zа-я])(коты)([^0-9a-zа-я]|$))

It turns out such a long gibberish. Take some text in which the form of the word "cat" is found:

 бла бла кот бла бла "кота" бла коты скоты

I need to replace all occurrences of the form of the word "cat", limited to non-dictionary characters (spaces, punctuation marks, etc.) with the same form, framed by angle brackets. That is, from the source line I need to get this:

 бла бла <кот> бла бла "<кота>" бла <коты> скоты

I do it like this:

 'бла бла кот бла бла "кота" бла коты скоты'.replace(/моя большая регулярка/gi, '$1<$2>$3')

It turns out that:

 "бла бла кот <>бла бла "кота"<> бла коты <>скоты"

As you can see, my method does not work - there are a lot of groups in a line (and there is a different number in each line), and therefore I cannot use group numbers when replacing - every time I will need different numbers.

I tried to use this replacement:

 'бла бла кот бла бла "кота" бла коты скоты'.replace(/моя большая регулярка/gi, function(match) { return "<" + match + ">"; })

This option almost works, producing the following result:

 "бла бла< кот >бла бла <"кота"> бла< коты >скоты"

But as you can see, it "captures" quotation marks and spaces inside brackets. Tell me how to organize a correct replacement in this situation? Thank you in advance!

A few important notes:

I cannot use the \b modifier since in the regular expression engine in JavaScript this modifier works correctly only with Latin letters (I usually have Cyrillic texts)
Approximately for the same reason I cannot use named groups - they are not supported in Js regular programs
The regulars themselves are somewhat simplified for demonstration purposes, for example, they do not have the letter "e" and capital letters.
Perhaps not all forms of the word "cat" are indicated here, in this example it is not so important.

it may be easier to write your parser instead of using regular expressions
In replace, the second argument can be the replacer function (str, p1, p2, p3, offset, s).
where by p1 p2 and p2 it will be possible to get the value of the brackets.
One question you have, for example, if there is such a string fd /кота sdfdf , then it should get fd </кота> sdfdf this or fd /<кота> sdfdf so?

Accepted Answer · 2017-06-23T11:08:12

Put all your cats in the second group of capture

 console.log('бла бла кот бла бла "кота" бла коты скоты'.replace(/(^|[^0-9a-zа-яё])(кот|котов|котам|котами|котах|коту|коте|котом|кота|коты)([^0-9a-zа-яё]|$)/gi, '$1<$2>$3'));

UPD to work with cats going through the space (1 character):

 console.log('котам бла бла кот кот бла бла "кота"кот бла коты скоты котом'.replace(/(^|[^0-9a-zа-яё])(кот|котов|котам|котами|котах|коту|коте|котом|кота|коты)(?![0-9a-zа-яё])/gi, '$1<$2>'));

)))))) I inserted this line and got out the mistake of бла бла кот бла &котик* sksdfk котенок бла "кота" бла коты скоты
@RazmikGalstyan, add a котенок to the dictionary and you will be happy
but I understand if I have 50 such words, should I put everything in there?
The man wrote such a JavaScript и длинная регулярка header JavaScript и длинная регулярка .

Raz galstyan raz galstyan 7,821 ten 41 · Answer 2 · 2017-06-23T11:36:36

Here is an example of a regular question for your question, and for an example I changed the example of your line a little, and everything does exactly what you want:

 let str = 'бла бла кот бла &котик* sksdfk котенок бла "кота" бла коты скоты'; let res = str.replace(/([a-zа-яё]*кот[a-zа-яё]*)/gi,'<$1>'); console.log(res);

Explanation of its incorrectness here ru.meta.stackoverflow.com/q/5513/186083
The author does not want to be captured, for example, the words скоты and котлы .

Javascript and long regular

2 answers 2

More articles: