The site itself is in utf-8 encoding. The default encoding is written in the nTaxes and in the page header. This site loads the page from Ineta and retrieves the encoding from the title response, for example, it is 1251. Then, using nokogiri.php, parsit title and h1 from the page. Well, displays on the site page. Duck that's how it happened that if the site in win1251, he displays it on the page well. and if I sparced the site in utf8 - it turned out krakozyabry. Previously, using simple_html_dom.php seemed to be all good with the encoding, but it is terribly slow and it eats memory.
I can not understand why the encoding behaves this way?
So, I realized that Nakogiri uses
$dom = new DOMDocument('1.0', 'UTF-8' );
here I tried to change the UTF-8 by windows-1251 with my hands! The most interesting thing is that if the site in windows-1251 t oats is ok, and if UTF-8, then everything is a failure! like this:
In the shape of the worm, in the form of the harmony of the worm, in the form of the worm and in the worm and in the worm and in the harrow of the harping
Once again I clarify the problem. My site loads someone else's / other pages from Ineta, pulls out the headers, takes the encoding from the server's response, and already converts the pulled out info into utf8, so I cannot prescribe any kind of conversion, I need it universally to make it.
DOMDocument for some reason correctly loads the pages on win1251, although my site is on utf8 (and the site and the database where it writes the data) if the page was on utf8, then the crocodiles get into the database (as the commentator already said in iso8859-1 encoding)