Recode htmlentities to human text

Question

Good day. Using PHP, I get a page from the site I want to parse. In some critical fields I find full text instead of text.

&amp;#1087;&amp;#1088;&amp;#1080;&amp;#1086;&amp;#1073;&amp;#1088;&amp;

Decoder Lebedeva successfully encodes this nonsense

HTML-Entities → UTF-8

html_entity_decode , which is advised in many forums makes of this thread

 &#1086;&#1089;&#1090;&#1072;&#1074;&#1082;&#1072;

That is, one cannot say that it does not work ... but not in the way we would like. Tell me what to do?

@Afftobus, If you are given an exhaustive answer, mark it as correct (click on the check mark next to the selected answer).

Im ieee im ieee 956 5 silver marks 23 bronze marks · Answer 1 · 2015-03-14T10:19:13

First, obviously, you need to replace all " & " with "&". It is possible normal str_replace . After that, the html_entity_decode works fine:

 <?php echo html_entity_decode( "&#1087;&#1088;&#1080;&#1086;&#1073;&#1088;", ENT_COMPAT, "UTF-8" ); ?>

Conclusion:

 приобр

Vitalina eleven 1 golden mark 2 silver marks 8 bronze marks · Answer 2 · 2015-03-13T07:45:35

The crutch is a crutch, but it works.

 function Unicode2Charset($string, $charset = 'UTF-8') { $string = html_entity_decode($string, ENT_NOQUOTES, $charset); return preg_replace( '~&#(?:x([\da-f]+)|(\d+));~ie', 'iconv("UTF-16LE", $charset, pack("v", "$1" ? hexdec("$1") : "$2"))', $string ); } $string = '&amp;#1087;&amp;#1088;&amp;#1080;&amp;#1086;&amp;#1073;&amp;#1088;&amp;'; echo Unicode2Charset($string); // OutPut приобр&

The standard one is lower, just this custom one with more fine tuning.

Recode htmlentities to human text

2 answers 2

More articles: