What non-obvious moments of regular expressions in javascript should be known to correctly implement a similar regular expression engine? Interested only in syntax (including ES6), as well as comparison with the expression and selection of groups, not a substitute. Full flag support is expected, including ES6 ( u and y ).

Answers with the description of a feature and tests (or their descriptions) for its check are supposed.
If code is offered to generate tests, ES6 syntax is welcome.

If in other languages ​​a certain syntax leads to a different result, it’s worth mentioning.

PS: Base tests add later.

  • 2
    so what is the essence of the question? - Jean-Claude
  • @ Jean-Claude, list the little-known regular expression features. For example, the fact that I cited in response or such is not yet checked for the subject of backreference. - Qwertiy
  • the thought with the title then does not coincide)) - Jean-Claude
  • 7
    какие неочевидные моменты регулярных выражений в javascript следует знать - everything? - Nick Volynkin
  • four
    @Qwertiy in general, I like the idea of ​​the question and the knowledge is valuable. It seems to me, cons because of the shape. (I did not put minuses) - Nick Volynkin

1 answer 1

Groups in backlinks can have numbers over 100

 `${"1".repeat(101)}2211`.match(RegExp(`${"(.)".repeat(101)}\\101`, 'g')) == '1'.repeat(100) + "22" `${"1".repeat(101)}2211`.match(RegExp(`${"(.)".repeat(101)}\\101`)) == `${"1".repeat(100)}22${",1".repeat(100)},2` 

Why this feature: in the replacement line is supported only 99 groups .

All entries found have a different starting index.

 "123".match(/^|./g) == ",2,3" "123".match(/(?=.)|./g) == ",," "123".match(/(?=3)|./g) == "1,2," "".match(/^|$/g) == "" 
 "123".match(/.|$/g) == "1,2,3," 

If it is possible to select an empty line and something starting just after it, then only one of them will be selected during global search.

You may notice that in C ++ this is not the case (at least, by default).

Regular Expression Syntax Extensions

Hyphens in character sets

 /[\w-_]/.exec("-")[0] === "-" 

Shielding of unofficial characters

 /\z/.exec("\\z")[0] === "z" /[\z]/.exec("[\\z]")[0] === "z" 

Invalid escape sequences

 /\c2/.exec("\\c2")[0] === "\\c2" 

Invalid Unicode Sequences

 /\u1/.exec("u1")[0] === "u1" /[\u1]/.exec("u")[0] === "u" 

Incorrect hexadecimal sequences

 /\x1/.exec("x1")[0] === "x1" /[\x1]/.exec("x")[0] === "x" 

Incomplete subtemplates and quantifiers

 /x{1/.exec("x{1")[0] === "x{1" /x]1/.exec("x]1")[0] === "x]1" 

Octal Sequences

 /\041/.exec("!")[0] === "!" /[\041]/.exec("!")[0] === "!" 

Non-existent backlinks become octal sequences.

 /\41/.exec("!")[0] === "!" /[\41]/.exec("!")[0] === "!" 

Source: ECMAScript 6 compatibility table (kangax).