Remove trim (html_entity_decode ('& nbsp;'))

Question

I tried to delete the space, did not work, went to the documentation and ...

Comment:
It may seem strange that the result of calling trim(html_entity_decode(' ')); is not an empty string. The reason is that   is not converted to a character with ASCII code 32 (which is removed by the trim() function), but to a character with ASCII code 160 (0xa0) in the default ISO-8859-1 encoding.

I thought, what if I checked which character the function would print

ord(html_entity_decode(' '))

Hoping to get the character code 160, I got 194 ...

How? And the funny thing is that chr(194) is a broken character.

Visman Visman 16.2k eight 21 52 · Accepted Answer · 2016-09-08T04:35:50

Hoping to get the character code 160, I got 194 ...
How? And the funny thing is that chr (194) is a broken character.

You have received not the broken character, but the first byte of the character. A non-breaking space from UTF-8, since:

ord() function works with single-byte characters
The non-breaking space in UTF-8 is represented by bytes: 194

 <?php function ordutf8($string, &$offset) { $code = ord(substr($string, $offset,1)); if ($code >= 128) { //otherwise 0xxxxxxx if ($code < 224) $bytesnumber = 2; //110xxxxx else if ($code < 240) $bytesnumber = 3; //1110xxxx else if ($code < 248) $bytesnumber = 4; //11110xxx $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0); for ($i = 2; $i <= $bytesnumber; $i++) { $offset ++; $code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx $codetemp = $codetemp*64 + $code2; } $code = $codetemp; } $offset += 1; if ($offset >= strlen($string)) $offset = -1; return $code; } $offset = 0; var_dump(ordutf8(html_entity_decode('&nbsp;'), $offset));

Result

 int 160

PHP 5.4.31 - 7.0.4

Function taken from the manual

And in php itself there are no functions (ord, chr) for working with multibyte encodings?
And the trim() function may not work correctly with UTF-8 if you specify a non-byte character in the second parameter.
Thank you for a good explanation. I'll go over the principle of how these functions work

user207618 · Answer 2 · 2016-09-08T03:09:06

You probably have PHP 5.6 and later.
Then the encoding parameter is taken from the default_charset configuration.
With this version of puff, this default setting is set to UTF-8 , which will indeed return 194 .
If you set the encoding in the remark, then everything works:

 var_dump(ord(html_entity_decode('&nbsp;', ENT_HTML5, 'ISO-8859-1'))); // int(160)

PHP 7.1

hmm, why does the broken symbol then return to me by code 194?
@MaximPro, because you are using UTF-8 , which apparently returns a character by code 194 .

Remove trim (html_entity_decode ('& nbsp;'))

2 answers 2

More articles: