Let's discuss regular expressions and escape sequences - separating flies from cutlets, so to speak.
It is known that C adopted the notation of string constants with the POSIX form: "slash with something escape sequences is one special character"
- \ n - line feed
- \ r - carriage transfer
- \ 0 - terminating symbol
- \ t, \ b, \ v, \ a ...
Accordingly, to write just a slash ('\'), it is necessary to write "\\", since Alone, he acts on the next character, acting on himself - gives himself. Their final number, I, in my opinion, even listed all, these are not all ASCII characters.
Really [- unrecognized character escape sequence - such special. no character.
Now regular expressions.
Speaking strictly mathematically, it is a certain alphabet (set) and operations: unions (U), concatenation (x) and iteration (*).
The definition is given by induction:
- The symbol of the alphabet is regular.
- Two (or one for iteration) regular expressions combined by one of the three operations are regular.
Now discuss the syntax of reg. vyr. PCRE (Perl Compatible Regular Expressions) - which are in fact irregular, and much stronger, thanks to "back links" and so on.
- . - this is any character of the alphabet (let the alphabet be ASCII);
- the digits taken in {} is the number of repetitions, this is a concatenation operation and a join operation in a row several times, but for brevity, recorded as {n, m};
- the characters taken in [] are “any of them”, they are just symbols of the alphabet, they should not be escaped with slashes ([.] is a dot);
- but [^.] is "everything except '.' "
- [az] - from a to z, because in encodings, characters are encoded in succession by numerical codes, this reduction is also actually not very regular.
BUT! Because we are also in the POSIX space and write expressions in strings, then our escape sequences are superimposed on this syntax .
- \. - this is the character '.', \ [- this is the character '[', but not escape sequences, the compiler does not know that we are inside the handler of regular sequences, which understands everything in its own way;
- [\\] - and this is the character '\', [\ ^], [\ -], "[\ n]" or "\ n" ... Well, do you understand me?
In general, it's all from that ... from the evil one.