Good day! I have a regular schedule designed to search for forms of the same word in the string. These regulars are automatically generated for each word and are rather long, but one pattern is created. For example, you need to create a regular calendar to search for the form of the word "cat". Take the "basic" part of the regular season:
(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡ)([^0-9a-zΠ°-Ρ]|$) and create on its basis a full regular schedule for all forms, gluing together from separate parts for each form:
((^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡ)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠΎΠ²)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠ°ΠΌ)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠ°ΠΌΠΈ)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠ°Ρ
)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΡ)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠΎΠΌ)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠ΅)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΠ°)([^0-9a-zΠ°-Ρ]|$)|(^|[^0-9a-zΠ°-Ρ])(ΠΊΠΎΡΡ)([^0-9a-zΠ°-Ρ]|$)) It turns out such a long gibberish. Take some text in which the form of the word "cat" is found:
Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡΠ°" Π±Π»Π° ΠΊΠΎΡΡ ΡΠΊΠΎΡΡ I need to replace all occurrences of the form of the word "cat", limited to non-dictionary characters (spaces, punctuation marks, etc.) with the same form, framed by angle brackets. That is, from the source line I need to get this:
Π±Π»Π° Π±Π»Π° <ΠΊΠΎΡ> Π±Π»Π° Π±Π»Π° "<ΠΊΠΎΡΠ°>" Π±Π»Π° <ΠΊΠΎΡΡ> ΡΠΊΠΎΡΡ I do it like this:
'Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡΠ°" Π±Π»Π° ΠΊΠΎΡΡ ΡΠΊΠΎΡΡ'.replace(/ΠΌΠΎΡ Π±ΠΎΠ»ΡΡΠ°Ρ ΡΠ΅Π³ΡΠ»ΡΡΠΊΠ°/gi, '$1<$2>$3') It turns out that:
"Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ <>Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡΠ°"<> Π±Π»Π° ΠΊΠΎΡΡ <>ΡΠΊΠΎΡΡ" As you can see, my method does not work - there are a lot of groups in a line (and there is a different number in each line), and therefore I cannot use group numbers when replacing - every time I will need different numbers.
I tried to use this replacement:
'Π±Π»Π° Π±Π»Π° ΠΊΠΎΡ Π±Π»Π° Π±Π»Π° "ΠΊΠΎΡΠ°" Π±Π»Π° ΠΊΠΎΡΡ ΡΠΊΠΎΡΡ'.replace(/ΠΌΠΎΡ Π±ΠΎΠ»ΡΡΠ°Ρ ΡΠ΅Π³ΡΠ»ΡΡΠΊΠ°/gi, function(match) { return "<" + match + ">"; }) This option almost works, producing the following result:
"Π±Π»Π° Π±Π»Π°< ΠΊΠΎΡ >Π±Π»Π° Π±Π»Π° <"ΠΊΠΎΡΠ°"> Π±Π»Π°< ΠΊΠΎΡΡ >ΡΠΊΠΎΡΡ" But as you can see, it "captures" quotation marks and spaces inside brackets. Tell me how to organize a correct replacement in this situation? Thank you in advance!
A few important notes:
- I cannot use the
\bmodifier since in the regular expression engine in JavaScript this modifier works correctly only with Latin letters (I usually have Cyrillic texts) - Approximately for the same reason I cannot use named groups - they are not supported in Js regular programs
- The regulars themselves are somewhat simplified for demonstration purposes, for example, they do not have the letter "e" and capital letters.
- Perhaps not all forms of the word "cat" are indicated here, in this example it is not so important.
fd /ΠΊΠΎΡΠ° sdfdf, then it should getfd </ΠΊΠΎΡΠ°> sdfdfthis orfd /<ΠΊΠΎΡΠ°> sdfdfso? - Raz Galstyan