QRegExp regexpsplineedit("[a-zA-Z\\[]{13}"); 

turning the meta character [ into normal works with one backslash, but if done with one, then Qt creator issues a warning warning: C4129: '[' : unrecognized character escape sequence Why and what is the difference between one backslash and two? And now I checked that it works even without a slash, that is, so "[a-zA-Z[]{13}" , and why is this happening?

    2 answers 2

    Let's discuss regular expressions and escape sequences - separating flies from cutlets, so to speak.

    It is known that C adopted the notation of string constants with the POSIX form: "slash with something escape sequences is one special character"

    • \ n - line feed
    • \ r - carriage transfer
    • \ 0 - terminating symbol
    • \ t, \ b, \ v, \ a ...

    Accordingly, to write just a slash ('\'), it is necessary to write "\\", since Alone, he acts on the next character, acting on himself - gives himself. Their final number, I, in my opinion, even listed all, these are not all ASCII characters.

    Really [- unrecognized character escape sequence - such special. no character.

    Now regular expressions.

    Speaking strictly mathematically, it is a certain alphabet (set) and operations: unions (U), concatenation (x) and iteration (*).

    The definition is given by induction:

    1. The symbol of the alphabet is regular.
    2. Two (or one for iteration) regular expressions combined by one of the three operations are regular.

    Now discuss the syntax of reg. vyr. PCRE (Perl Compatible Regular Expressions) - which are in fact irregular, and much stronger, thanks to "back links" and so on.

    • . - this is any character of the alphabet (let the alphabet be ASCII);
    • the digits taken in {} is the number of repetitions, this is a concatenation operation and a join operation in a row several times, but for brevity, recorded as {n, m};
    • the characters taken in [] are “any of them”, they are just symbols of the alphabet, they should not be escaped with slashes ([.] is a dot);
    • but [^.] is "everything except '.' "
    • [az] - from a to z, because in encodings, characters are encoded in succession by numerical codes, this reduction is also actually not very regular.

    BUT! Because we are also in the POSIX space and write expressions in strings, then our escape sequences are superimposed on this syntax .

    • \. - this is the character '.', \ [- this is the character '[', but not escape sequences, the compiler does not know that we are inside the handler of regular sequences, which understands everything in its own way;
    • [\\] - and this is the character '\', [\ ^], [\ -], "[\ n]" or "\ n" ... Well, do you understand me?

    In general, it's all from that ... from the evil one.

      Turning a meta character [into normal works with one backslash, but if done with one, then Qt creator issues a warning warning: C4129: '[': unrecognized character escape sequence. Why so and what is the difference between one backslash and two?

      Backslash \ is a special character both inside a string constant in C, and in the regular expression language (PCRE). If two slashes are used, then the C compiler eats one slash and the pcre engine sees only the remaining slash and interprets \[ as simple [ .

      If only one slash is used, then the C compiler tries to interpret the \[ sequence. There is no such sequence, so the compiler discards the slash (I don’t know if this is a guaranteed behavior according to the C standard) and the pcre engine sees only one character [ .

      And now I checked that it works even without a slash, that is, so "[a-zA-Z [] {13}", and why is this happening?

      Additionally, the text inside [] also has its own rules, namely [ it is not a special character inside [] , so it does not need to be escaped. For example, "[[]" is a valid regular expression.


      In order not to produce a double (or even triple in this case) interpretation of a string, c string string literals are introduced in c ++ 11:

       #include <cassert> #include <cstring> int main() { const char* s = R"foo(Hello\\ World )")foo"; assert(std::strcmp(s, "Hello\\\\\nWorld\n)\"") == 0); } 

      Constants like R"()" convenient to use to write regular expressions and file paths in Windows.