The project is being developed in utf-8 encoding. Faced the problem of processing strings with Cyrillic characters. Code

<? header('Content-Type: text/html; charset=utf-8'); $str = "Дополнительное оборудование"; echo substr($str, 0, 7); echo "<br>"; $str = "Dopolnitelnoe oborudovanie"; echo substr($str, 0, 7); ?> 

Gives the result

 Доп  Dopolni 

Those. for Cyrillic, the function does not work.

What needs to be done to get "Add" instead of "Additional"?

    4 answers 4

    Read http://php.net/manual/en/ref.mbstring.php

    In short, mb_substr() used for mb_substr() encodings

     mb_substr($str, 0, 7, "UTF-8"); 
    • I will study mbstring, but now I need a quick solution. mb_substr () returns "Additional". - alexkad
    • one
      Pass the encoding used to the function. echo mb_substr ($ str, 0, 7, "UTF-8"); - Get
    • It works, thanks! - alexkad
    • You probably have an old version of PHP. In the new mb * encoding functions by default - utf-8. I recommend at the beginning of the script set mb_internal_encoding ("utf-8") to set the default. - artoodetoo

    If the site is developed, as you write, in UTF-8, then UTF-8 must be developed. In order for all multibyte functions to work by default with the developed encoding, it must be specified at the very beginning (and it is advisable to check whether it was specified or not):

     const PAGE_ENCODING ='UTF-8'; if(mb_internal_encoding(PAGE_ENCODING) != PAGE_ENCODING) throw new SomeException('There is no support encoding: '.PAGE_ENCODING); 

    If everything is ok, you can use all mb_ without encoding registration.

    I forgot to write why this is important, well, if you are developing. Because when using the function name as a callback, there will be nowhere to write the encoding.

      mb_substr () instead of substr ()

      • It turns out "Additional." How to remove the character? - alexkad
      • one
        try the last argument to add the encoding, that is, echo substr ($ str, 0, 7, "utf-8"); - Diefair
      • one
        yes, mb_substr works ($ str, 0, 7, "UTF-8"); - alexkad

      In order for PHP to work with Cyrillic strings character by character (including extracting a substring, etc.), you need to use special functions: http://php.net/manual/ru/ref.mbstring.php .

      All because of the fact that in Latin 1 character = 1 bit, therefore:

       $string = 'XYZ'; echo $string[0]; // будет равно X 

      But Cyrillic characters occupy 2 bits, therefore:

       $string = 'ЭЮЯ'; echo $string[0]; // будет равно   

      In this case, you can take this into account and work in this way:

       $string = 'ЭЮЯ'; echo $string[0] . $string[1]; //output: Э 

      Or split a line through str_split, specifying split_length = 2:

       $string = 'ЭЮЯ'; $arrStr = str_split($string, 2); // = ['Э', 'Ю', 'Я'] 

      But it is better not to do this , because it will now be impossible to work with the Latin alphabet and the rest of the characters:

       $string = 'ЭЮЯ. XYZAB'; $strArr = str_split($string, 2); // = ['Э', 'Ю', 'Я', '. ', 'XY', 'ZA', 'B'] 

      By the way, in order to normally divide the term with Russian characters into an array of characters, it is best to do so:

       $string = 'ЭЮЯ. XYZAB'; $strArr = preg_split('//u', $string, null, PREG_SPLIT_NO_EMPTY); // = ['Э', 'Ю', 'Я', '.', ' ', 'X', 'Y', 'Z', A', 'B']