I tried to delete the space, did not work, went to the documentation and ...

Comment:

It may seem strange that the result of calling trim(html_entity_decode(' ')); is not an empty string. The reason is that   is not converted to a character with ASCII code 32 (which is removed by the trim() function), but to a character with ASCII code 160 (0xa0) in the default ISO-8859-1 encoding.

I thought, what if I checked which character the function would print

ord(html_entity_decode(' '))

Hoping to get the character code 160, I got 194 ...

How? And the funny thing is that chr(194) is a broken character.

    2 answers 2

    Hoping to get the character code 160, I got 194 ...

    How? And the funny thing is that chr (194) is a broken character.

    You have received not the broken character, but the first byte of the character. A non-breaking space from UTF-8, since:

    1. ord() function works with single-byte characters
    2. The non-breaking space in UTF-8 is represented by bytes: 194


     <?php function ordutf8($string, &$offset) { $code = ord(substr($string, $offset,1)); if ($code >= 128) { //otherwise 0xxxxxxx if ($code < 224) $bytesnumber = 2; //110xxxxx else if ($code < 240) $bytesnumber = 3; //1110xxxx else if ($code < 248) $bytesnumber = 4; //11110xxx $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0); for ($i = 2; $i <= $bytesnumber; $i++) { $offset ++; $code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx $codetemp = $codetemp*64 + $code2; } $code = $codetemp; } $offset += 1; if ($offset >= strlen($string)) $offset = -1; return $code; } $offset = 0; var_dump(ordutf8(html_entity_decode('&nbsp;'), $offset)); 

    Result

     int 160 

    PHP 5.4.31 - 7.0.4

    Function taken from the manual

    • And in php itself there are no functions (ord, chr) for working with multibyte encodings? - MaximPro
    • Not. And the trim() function may not work correctly with UTF-8 if you specify a non-byte character in the second parameter. PS chr UTF-8 php.net/manual/ru/function.chr.php#88611 - Visman
    • Thank you for a good explanation. I'll go over the principle of how these functions work - MaximPro Sept

    You probably have PHP 5.6 and later.
    Then the encoding parameter is taken from the default_charset configuration.
    With this version of puff, this default setting is set to UTF-8 , which will indeed return 194 .
    If you set the encoding in the remark, then everything works:

     var_dump(ord(html_entity_decode('&nbsp;', ENT_HTML5, 'ISO-8859-1'))); // int(160) 

    PHP 7.1

    • hmm, why does the broken symbol then return to me by code 194? - MaximPro
    • @MaximPro, because you are using UTF-8 , which apparently returns a character by code 194 . Different encodings are so different :) - user207618
    • I rummaged around the net and found nbsp; it is represented in utf-8 as 2 codes c2 a0 - 194 160 - MaximPro