Whether or not the u modifier is used depends on the purpose of the regular expression and your skill in composing them.
Here is an example in which I divide a string (UTF-8) into two parts: the first character and all the others:
<?php $str = 'Π°Π±Π²Π³Π΄'; // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π΅ΡΡΡ preg_match('%^(.)(.+)$%u', $str, $matches); var_dump($matches); // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π½Π΅Ρ preg_match('%^(.)(.+)$%', $str, $matches); var_dump($matches);
Result of work:
array(3) { [0]=> string(10) "Π°Π±Π²Π³Π΄" [1]=> string(2) "Π°" [2]=> string(8) "Π±Π²Π³Π΄" } array(3) { [0]=> string(10) "Π°Π±Π²Π³Π΄" [1]=> string(1) " " [2]=> string(9) " Π±Π²Π³Π΄" }
Here a regular season without a modifier worked with an error, since . without the modifier, u corresponds to 1 byte (and not a character) except byte x0D.
And now another example: get a substring between two brackets:
<?php $str = 'ΠΏΡΠ³[Π°Π±Π²Π³Π΄]ΠΊΡΠΌ'; // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π΅ΡΡΡ preg_match('%\[([^\]]*)\]%u', $str, $matches); var_dump($matches); // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π½Π΅Ρ preg_match('%\[([^\]]*)\]%', $str, $matches); var_dump($matches);
Result of work:
array(2) { [0]=> string(12) "[Π°Π±Π²Π³Π΄]" [1]=> string(10) "Π°Π±Π²Π³Π΄" } array(2) { [0]=> string(12) "[Π°Π±Π²Π³Π΄]" [1]=> string(10) "Π°Π±Π²Π³Π΄" }
Both options work correctly, since the characters [ and ] uniquely identified in the UTF-8 encoding and their codes are not part of multibyte characters.
UPD
depending on the skill of compiling and in the first case could the result be obtained without a modifier?
<?php $str = 'abcde'; // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π½Π΅Ρ preg_match('%^([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2})(.+)$%', $str, $matches); var_dump($matches); $str = 'Π°Π±Π²Π³Π΄'; // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π½Π΅Ρ preg_match('%^([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2})(.+)$%', $str, $matches); var_dump($matches); $str = 'α Π°Π±Π²Π³Π΄'; // ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΎΡ Π½Π΅Ρ preg_match('%^([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2})(.+)$%', $str, $matches); var_dump($matches);
Result of work:
array(3) { [0]=> string(5) "abcde" [1]=> string(1) "a" [2]=> string(4) "bcde" } array(3) { [0]=> string(10) "Π°Π±Π²Π³Π΄" [1]=> string(2) "Π°" [2]=> string(8) "Π±Π²Π³Π΄" } array(3) { [0]=> string(13) "α Π°Π±Π²Π³Π΄" [1]=> string(3) "α " [2]=> string(10) "Π°Π±Π²Π³Π΄" }
umodifier - Grundy