Good day. Using PHP, I get a page from the site I want to parse. In some critical fields I find full text instead of text.

приобр& 

Decoder Lebedeva successfully encodes this nonsense

HTML-Entities → UTF-8

html_entity_decode , which is advised in many forums makes of this thread

 оставка 

That is, one cannot say that it does not work ... but not in the way we would like. Tell me what to do?

  • @Afftobus, If you are given an exhaustive answer, mark it as correct (click on the check mark next to the selected answer). - Vitalina

2 answers 2

First, obviously, you need to replace all " & " with "&". It is possible normal str_replace . After that, the html_entity_decode works fine:

 <?php echo html_entity_decode( "&#1087;&#1088;&#1080;&#1086;&#1073;&#1088;", ENT_COMPAT, "UTF-8" ); ?> 

Conclusion:

 приобр 

    The crutch is a crutch, but it works.

     function Unicode2Charset($string, $charset = 'UTF-8') { $string = html_entity_decode($string, ENT_NOQUOTES, $charset); return preg_replace( '~&#(?:x([\da-f]+)|(\d+));~ie', 'iconv("UTF-16LE", $charset, pack("v", "$1" ? hexdec("$1") : "$2"))', $string ); } $string = '&amp;#1087;&amp;#1088;&amp;#1080;&amp;#1086;&amp;#1073;&amp;#1088;&amp;'; echo Unicode2Charset($string); // OutPut приобр& 
    • Super! Thank! Is there no standard function ?! - Afftobus
    • The standard one is lower, just this custom one with more fine tuning. - barseon