Inexplicable regular expression behavior

Question

In principle, the regular checker checks that the name contains only letters Russian and Latin, and a few additional ёЁҷҶқҚӯӮҳҲӣӢғҒ and the name consists of 3 to 25 characters.

 elseif (!preg_match('/[a-zA-Zа-яА-ЯёЁҷҶқҚӯӮҳҲӣӢғҒ]{3,25}$/i', $name)) { $this->setFieldError("surname", "Неверный формат имени.<br>Имя должна состоять только из букв и должно содержать 3-25 букв"); return; }

On online tools like https://regex101.com everything works fine, but it works differently on the server. For example, on the server with the values "Shukhratҷon" does not show an error (as it should), but with the value "Shukhrat" already shows an error.
But in the second case, too, should not show an error, what is happening?
What am I doing wrong?

In 1251, for example, there is a gap in the ASCII table, and it is correct to divide a-o n-th, since
In my version of ^ not a negation, but the beginning of a line.

Small small 14.5k 1 golden mark 10 silver marks 35 bronze marks · Accepted Answer · 2016-07-27T08:50:05

In PCRE regular expressions in PHP, unicode support is inactive by default. Backward compatibility, all things ... For the processing of strings in Unicode, the u modifier is required. (not to be confused with the inversion of greed U ).

The absence of the initial constraint ^ and the redundant modifier i , when you have already listed the registers, also looks strange.

 /^[a-zA-Zа-яА-ЯёЁҷҶқҚӯӮҳҲӣӢғҒ]{3,25}$/u

In the wake of comments - ^ has several meanings. At the beginning of the character mask - the negation [^A] - everything except the symbol A. In a regular expression - marks the beginning of a line, in the same way as $ marks the end of a line. /^[abc]$/ - matches only if the string consists of a single character a, b or c. /[abc]$/ - if a string of any length ends with a, b, or c.

Yes, I have not carefully studied the regular expression. Thanks for the explanation, your example worked - Shuhratjon Jumaev

Community spirit ♦ one · Answer 2 · 2016-07-27T09:11:31

In fact, the i modifier (ignoring the register) has not been canceled for Unicode. There has even recently been a post about this: How do we know the correspondence between uppercase and lowercase (uppercase and lowercase) characters?

A regular season will be this:

 /^[a-zа-яёҷқӯҳӣғ]{3,25}$/iu

Inexplicable regular expression behavior

2 answers 2

More articles: