The problem with Cyrillic php

Question

The project is being developed in utf-8 encoding. Faced the problem of processing strings with Cyrillic characters. Code

<? header('Content-Type: text/html; charset=utf-8'); $str = "Дополнительное оборудование"; echo substr($str, 0, 7); echo "<br>"; $str = "Dopolnitelnoe oborudovanie"; echo substr($str, 0, 7); ?>

Gives the result

 Доп  Dopolni

Those. for Cyrillic, the function does not work.

What needs to be done to get "Add" instead of "Additional"?

Nofate ♦ 32.5k 13 55 86 · Accepted Answer · 2014-10-21T21:38:47

Read http://php.net/manual/en/ref.mbstring.php

In short, mb_substr() used for mb_substr() encodings

 mb_substr($str, 0, 7, "UTF-8");

Nofate ♦

32.5k 13 55 86

Get

3.432 five 7

I will study mbstring, but now I need a quick solution. mb_substr () returns "Additional". - alexkad
one
Pass the encoding used to the function. echo mb_substr ($ str, 0, 7, "UTF-8"); - Get
It works, thanks! - alexkad
You probably have an old version of PHP. In the new mb * encoding functions by default - utf-8. I recommend at the beginning of the script set mb_internal_encoding ("utf-8") to set the default. - artoodetoo

|

Vitalina eleven one 2 eight · Answer 2 · 2014-10-22T16:58:09

If the site is developed, as you write, in UTF-8, then UTF-8 must be developed. In order for all multibyte functions to work by default with the developed encoding, it must be specified at the very beginning (and it is advisable to check whether it was specified or not):

 const PAGE_ENCODING ='UTF-8'; if(mb_internal_encoding(PAGE_ENCODING) != PAGE_ENCODING) throw new SomeException('There is no support encoding: '.PAGE_ENCODING);

If everything is ok, you can use all mb_ without encoding registration.

I forgot to write why this is important, well, if you are developing. Because when using the function name as a callback, there will be nowhere to write the encoding.

Diefair Diefair 1,431 ten 24 · Answer 3 · 2014-10-21T21:42:03

mb_substr () instead of substr ()

Diefair

1,431 ten 24

It turns out "Additional." How to remove the character? - alexkad
one
try the last argument to add the encoding, that is, echo substr ($ str, 0, 7, "utf-8"); - Diefair
one
yes, mb_substr works ($ str, 0, 7, "UTF-8"); - alexkad

|

Daniel Shevchenko Daniel Shevchenko one · Answer 4 · 2019-02-24T10:56:33

In order for PHP to work with Cyrillic strings character by character (including extracting a substring, etc.), you need to use special functions: http://php.net/manual/ru/ref.mbstring.php .

All because of the fact that in Latin 1 character = 1 bit, therefore:

 $string = 'XYZ'; echo $string[0]; // будет равно X

But Cyrillic characters occupy 2 bits, therefore:

 $string = 'ЭЮЯ'; echo $string[0]; // будет равно

In this case, you can take this into account and work in this way:

 $string = 'ЭЮЯ'; echo $string[0] . $string[1]; //output: Э

Or split a line through str_split, specifying split_length = 2:

 $string = 'ЭЮЯ'; $arrStr = str_split($string, 2); // = ['Э', 'Ю', 'Я']

But it is better not to do this , because it will now be impossible to work with the Latin alphabet and the rest of the characters:

 $string = 'ЭЮЯ. XYZAB'; $strArr = str_split($string, 2); // = ['Э', 'Ю', 'Я', '. ', 'XY', 'ZA', 'B']

By the way, in order to normally divide the term with Russian characters into an array of characters, it is best to do so:

 $string = 'ЭЮЯ. XYZAB'; $strArr = preg_split('//u', $string, null, PREG_SPLIT_NO_EMPTY); // = ['Э', 'Ю', 'Я', '.', ' ', 'X', 'Y', 'Z', A', 'B']

The problem with Cyrillic php

4 answers 4

More articles: