Why can that be

preg_match('/^[а-яА-ЯЁёa-zA-Z0-9_]+$/', $userName) 

returns true for the letter "PP", "pp", "UU", "ee", "yaya" and false for "tm", "yy", "yu" and others?

UTF-8 encoding. Php 5.2.17

I also noticed that strlen($userName) , for example, for "nn" = 4, and for "yy" = 2.

P.S. Wherein

 preg_match('/^[_0-9A-Za-zА-Яа-пр-яЁё]+$/', $userName) 

works.

    2 answers 2

    In PHP, regular expressions do not work with Russian letters. To work with them you need to use the modifier /u

    u (PCRE_UTF8)
    This modifier includes additional PCRE functionality that is not compatible with Perl: the template and the target string are treated as UTF-8 strings. The u modifier is available in PHP 4.1.0 and above for Unix platforms, and in PHP 4.2.3 and above for Windows platforms. The validity of UTF-8 in the pattern and target string is checked starting with PHP 4.3.5. An invalid target string causes the preg_ * functions to find nothing, and an invalid pattern results in an E_WARNING error. The fifth and sixth octets of the UTF-8 sequence are considered invalid with PHP 5.3.4 (as per PCRE 7.3 2007-08-28); previously they were considered valid.

    A source

    Example:

     preg_match( '/^([а-яА-ЯЁёa-zA-Z0-9_]+)$/u', $userName) 
    • thanks, it worked. And what exactly does this modifier do? - klm123
    • one
      Allows you to use characters from the unicode table - ActivX

    At the expense of different results strlen - for utf-8 should use mb_strlen .

    • 2
      This is due to the fact that strlen measures how many bytes a string takes, and each byte is one character, but not in UTF. Because 1 byte is 256 values, utf just does not fit in this order, so two bytes are used to write the UTF code, so it turns out that the utf text is twice as large in size. And the mb_strlen function solves this problem - ActivX