In principle, the regular checker checks that the name contains only letters Russian and Latin, and a few additional ёЁҷҶқҚӯӮҳҲӣӢғҒ and the name consists of 3 to 25 characters.

 elseif (!preg_match('/[a-zA-Zа-яА-ЯёЁҷҶқҚӯӮҳҲӣӢғҒ]{3,25}$/i', $name)) { $this->setFieldError("surname", "Неверный формат имени.<br>Имя должна состоять только из букв и должно содержать 3-25 букв"); return; } 

On online tools like https://regex101.com everything works fine, but it works differently on the server. For example, on the server with the values ​​"Shukhratҷon" does not show an error (as it should), but with the value "Shukhrat" already shows an error.
But in the second case, too, should not show an error, what is happening?
What am I doing wrong?

  • Check the encoding. In different encodings in different ways. In 1251, for example, there is a gap in the ASCII table, and it is correct to divide a-o n-th, since There are other characters in the middle of the encoding. The utf-8 example works correctly. - nick_n_a
  • UTF-8 encoding without BOM @nick_n_a - Shuhratjon Jumaev
  • /^[a-zA-Zа-яА-ЯёЁҷҶқҚӯӮҳҲӣӢғҒ]{3,25}$/u - Visman
  • @Visman as far as I know ^ these are negatives. But I checked it out and it didn't work out - Shuhratjon Jumaev
  • In my version of ^ not a negation, but the beginning of a line. Denying it like this [^a-zA-Z] - Visman

2 answers 2

In PCRE regular expressions in PHP, unicode support is inactive by default. Backward compatibility, all things ... For the processing of strings in Unicode, the u modifier is required. (not to be confused with the inversion of greed U ).

The absence of the initial constraint ^ and the redundant modifier i , when you have already listed the registers, also looks strange.

 /^[a-zA-Zа-яА-ЯёЁҷҶқҚӯӮҳҲӣӢғҒ]{3,25}$/u 

In the wake of comments - ^ has several meanings. At the beginning of the character mask - the negation [^A] - everything except the symbol A. In a regular expression - marks the beginning of a line, in the same way as $ marks the end of a line. /^[abc]$/ - matches only if the string consists of a single character a, b or c. /[abc]$/ - if a string of any length ends with a, b, or c.

  • Yes, I have not carefully studied the regular expression. Thanks for the explanation, your example worked - Shuhratjon Jumaev

In fact, the i modifier (ignoring the register) has not been canceled for Unicode. There has even recently been a post about this: How do we know the correspondence between uppercase and lowercase (uppercase and lowercase) characters?

A regular season will be this:

 /^[a-zа-яёҷқӯҳӣғ]{3,25}$/iu