Good day! I have a regular schedule designed to search for forms of the same word in the string. These regulars are automatically generated for each word and are rather long, but one pattern is created. For example, you need to create a regular calendar to search for the form of the word "cat". Take the "basic" part of the regular season:

(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚)([^0-9a-zΠ°-я]|$) 

and create on its basis a full regular schedule for all forms, gluing together from separate parts for each form:

 ((^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚ΠΎΠ²)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Π°ΠΌ)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Π°ΠΌΠΈ)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Π°Ρ…)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Ρƒ)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚ΠΎΠΌ)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Π΅)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Π°)([^0-9a-zΠ°-я]|$)|(^|[^0-9a-zΠ°-я])(ΠΊΠΎΡ‚Ρ‹)([^0-9a-zΠ°-я]|$)) 

It turns out such a long gibberish. Take some text in which the form of the word "cat" is found:

 Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡ‚Π°" Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты 

I need to replace all occurrences of the form of the word "cat", limited to non-dictionary characters (spaces, punctuation marks, etc.) with the same form, framed by angle brackets. That is, from the source line I need to get this:

 Π±Π»Π° Π±Π»Π° <ΠΊΠΎΡ‚> Π±Π»Π° Π±Π»Π° "<ΠΊΠΎΡ‚Π°>" Π±Π»Π° <ΠΊΠΎΡ‚Ρ‹> скоты 

I do it like this:

 'Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡ‚Π°" Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты'.replace(/моя большая рСгулярка/gi, '$1<$2>$3') 

It turns out that:

 "Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ <>Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡ‚Π°"<> Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ <>скоты" 

As you can see, my method does not work - there are a lot of groups in a line (and there is a different number in each line), and therefore I cannot use group numbers when replacing - every time I will need different numbers.

I tried to use this replacement:

 'Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡ‚Π°" Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты'.replace(/моя большая рСгулярка/gi, function(match) { return "<" + match + ">"; }) 

This option almost works, producing the following result:

 "Π±Π»Π° Π±Π»Π°< ΠΊΠΎΡ‚ >Π±Π»Π° Π±Π»Π° <"ΠΊΠΎΡ‚Π°"> Π±Π»Π°< ΠΊΠΎΡ‚Ρ‹ >скоты" 

But as you can see, it "captures" quotation marks and spaces inside brackets. Tell me how to organize a correct replacement in this situation? Thank you in advance!

A few important notes:

  1. I cannot use the \b modifier since in the regular expression engine in JavaScript this modifier works correctly only with Latin letters (I usually have Cyrillic texts)
  2. Approximately for the same reason I cannot use named groups - they are not supported in Js regular programs
  3. The regulars themselves are somewhat simplified for demonstration purposes, for example, they do not have the letter "e" and capital letters.
  4. Perhaps not all forms of the word "cat" are indicated here, in this example it is not so important.
  • it may be easier to write your parser instead of using regular expressions - Mikhail Vaysman
  • In replace, the second argument can be the replacer function (str, p1, p2, p3, offset, s). where by p1 p2 and p2 it will be possible to get the value of the brackets. return functions will replace the found value - Alexander Pakrulin
  • One question you have, for example, if there is such a string fd /ΠΊΠΎΡ‚Π° sdfdf , then it should get fd </ΠΊΠΎΡ‚Π°> sdfdf this or fd /<ΠΊΠΎΡ‚Π°> sdfdf so? - Raz Galstyan

2 answers 2

Put all your cats in the second group of capture

 console.log('Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡ‚Π°" Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты'.replace(/(^|[^0-9a-zΠ°-яё])(ΠΊΠΎΡ‚|ΠΊΠΎΡ‚ΠΎΠ²|ΠΊΠΎΡ‚Π°ΠΌ|ΠΊΠΎΡ‚Π°ΠΌΠΈ|ΠΊΠΎΡ‚Π°Ρ…|ΠΊΠΎΡ‚Ρƒ|ΠΊΠΎΡ‚Π΅|ΠΊΠΎΡ‚ΠΎΠΌ|ΠΊΠΎΡ‚Π°|ΠΊΠΎΡ‚Ρ‹)([^0-9a-zΠ°-яё]|$)/gi, '$1<$2>$3')); 

UPD to work with cats going through the space (1 character):

 console.log('ΠΊΠΎΡ‚Π°ΠΌ Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ ΠΊΠΎΡ‚ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡ‚Π°"ΠΊΠΎΡ‚ Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты ΠΊΠΎΡ‚ΠΎΠΌ'.replace(/(^|[^0-9a-zΠ°-яё])(ΠΊΠΎΡ‚|ΠΊΠΎΡ‚ΠΎΠ²|ΠΊΠΎΡ‚Π°ΠΌ|ΠΊΠΎΡ‚Π°ΠΌΠΈ|ΠΊΠΎΡ‚Π°Ρ…|ΠΊΠΎΡ‚Ρƒ|ΠΊΠΎΡ‚Π΅|ΠΊΠΎΡ‚ΠΎΠΌ|ΠΊΠΎΡ‚Π°|ΠΊΠΎΡ‚Ρ‹)(?![0-9a-zΠ°-яё])/gi, '$1<$2>')); 

  • @Pupkin Where is the <ΠΊΠΎΡ‚Π΅Π½ΠΎΠΊ> ? )))))) I inserted this line and got out the mistake of Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ Π±Π»Π° &ΠΊΠΎΡ‚ΠΈΠΊ* sksdfk ΠΊΠΎΡ‚Π΅Π½ΠΎΠΊ Π±Π»Π° "ΠΊΠΎΡ‚Π°" Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты - Raz Galstyan
  • one
    @RazmikGalstyan, add a ΠΊΠΎΡ‚Π΅Π½ΠΎΠΊ to the dictionary and you will be happy - Visman
  • but I understand if I have 50 such words, should I put everything in there? And if all this needs to be configured dynamically? - Raz Galstyan
  • The man wrote such a JavaScript ΠΈ длинная рСгулярка header JavaScript ΠΈ длинная рСгулярка . - Raz Galstyan
  • one
    @RazmikGalstyan, yes, all 50 and you'll bet. On your curve, I have already left a comment. - Visman

Here is an example of a regular question for your question, and for an example I changed the example of your line a little, and everything does exactly what you want:

 let str = 'Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ‚ Π±Π»Π° &ΠΊΠΎΡ‚ΠΈΠΊ* sksdfk ΠΊΠΎΡ‚Π΅Π½ΠΎΠΊ Π±Π»Π° "ΠΊΠΎΡ‚Π°" Π±Π»Π° ΠΊΠΎΡ‚Ρ‹ скоты'; let res = str.replace(/([a-zΠ°-яё]*ΠΊΠΎΡ‚[a-zΠ°-яё]*)/gi,'<$1>'); console.log(res); 

  • My comments have lost to this wrong answer. Explanation of its incorrectness here ru.meta.stackoverflow.com/q/5513/186083 - Visman
  • 2
    -1 The answer is completely wrong. The author does not want to be captured, for example, the words скоты and ΠΊΠΎΡ‚Π»Ρ‹ . Only word forms of ΠΊΠΎΡ‚ , - Vadim Ovchinnikov