Sample input:
<tag attr='val'>123 123</tag>
Objective: to find a match between the angle brackets (no matter what, the point is to look at the difference in behavior).
Use the non-greedy (lazy, lazy) quantifier: <tag.*?> In this case, the regular expression engine will literally try to "get off quickly." How the search will occur:
- found
<tag . Go ahead, select under .* - track. space character. Fits under
.* - excellent, enough with me, go ahead and select under > - the character
a - hell, you have to go back, again we select under .* , but starting from a a fits under .* - and that's enough, we select under >- character
t - damn ... -
and so on, with constant reversals. Result: <tag attr='val'>
Use the greedy quantifier: <tag.*> The engine will try to pick up more characters for each quantifier and roll back only if no match is found:
- found
<tag . Go ahead, select under .* - track. space character. It fits under
.* - great, but we try to go further and try to intercept the following characters with .* : attr='val'>123 123</tag> - everything, there is nothing more to add, go ahead - we select under
> - and there is already no text. We'll have to roll back - go back to the character and select > - track. symbol
> - fits under > . The text is over, the pattern is over
Result: <tag attr='val'>123 123</tag>
We use the supercade (jealous, possessive) quantifier: <tag.*+> No rollbacks will be made at all:
- found
<tag . Go ahead, select under .* - track. space character. Fits under
.* - great, we try to go further and try to interrupt the following characters with .* : attr='val'>123 123</tag> - everything, there is nothing else to add, go further - we select under
> - and there is already no text. Do not care, no turning back
no matches.
And on the tab of the vertical tab - it's just a control character, the same as \n or \t . It has no direct relation to regular expressions. Previously used in printers - I don’t know the exact details, but oldfags are remembered . Thus, \v simply looks for the presence of this character in the string.
UPD. (Comment on @knes and @Valeriy Karchov)
As far as I understand, it's about performance. Both greedy and lazy quantifiers store backlinks for the ability to go back. If the pattern is complex, with attachments, then there can be a lot of such returns. If a match is found (or can be found), then everything is OK, but if there is no match, then a long search of various options may begin, and in this case, the supercade quantifiers will quickly determine that there is no match.
Here is an artificial, but illustrative example: apply the pattern (x+x+)+y to a string of the form xxxxxxxxxxy . If y at the end, then OK, only one rollback will occur (when a match is found for the second x+ ) and the job is done. But if y is not at the end, then the engine will strain all possible combinations. So, on my machine this search (Java) in a row of 19 x took 2 seconds. On the other hand, it is obvious that if any section was traversed with (x+x+)+ , then y is definitely not there. This means that we can set a superjudic quantifier: (x+x+)++y - since we know for sure that a rollback will not lead to finding y .
Thus, super-nest quantifiers can be used in cases where the expression under the quantifier cannot swallow the characters that were supposed to be pulled out with the expression following the quantifier. In situations with inappropriate input, this will make it faster to determine that there is no match. So, some regular expression engines even define situations like [^x]+x and substitute there a super-greedy quantifier.