The file is stored in the database like this: http://pastebin.com/s8ZeyKFS

When outputting to the html console, all characters are displayed normally.

When parsing regular too. Here is a screen.

enter image description here

But! As soon as I load into DomDocument, phpQuery, SimpleXML, Nokogiri, the text breaks down and incomprehensible characters are output!

enter image description here

$content = html_entity_decode($page['content'], ENT_QUOTES, 'UTF-8'); echo $content;//OK $dom = new DOMDocument('1.0', 'UTF-8'); $dom->loadHTML($content); echo $dom->getElementsByTagName('title') ->item(0)->textContent;//BROKEN 

What is the problem I can not understand, help?

  • What is the encoding file? give the code of the file, the code of the parsin - Alexey Shimansky
  • added code and file contents - s4urp8n

1 answer 1

enter image description here

As it turned out, the problem was in the Windows console, which is unable to display some characters. Even after activating support for UTF-8.