Getting the header in the appropriate encoding

Question

There is a task to get the value of the title tag of the html document and display it to the user. All anything, with sites in UTF-8 everything is fine, but there are a lot of sites on the Internet. Not all of them are in UTF-8 and very many are displayed as krakozybras. Tell me the right way, how can I get the title value and display it in Russian (if it is in Russian) or in English without cracking? Here is how I tried to solve this problem.

 $curl_handle = curl_init(); curl_setopt($curl_handle, CURLOPT_URL, $url); curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true); curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true); $content = curl_exec($curl_handle); curl_close($curl_handle); preg_match("/<title>(.*)<\/title>/siU", $content, $matches); $detect = mb_detect_encoding($title, mb_detect_order(), true); $title = iconv($detect, "utf-8", $matches[1]); echo $title;

Maybe it will be easier to throw a get-request to the page and parse the title?
Find out if there is any way to get the title of the page without krakozyabr

Vlad Vlad 317 3 14 · Answer 1 · 2016-09-22T22:45:16

You can parse the <meta charset=""> value along with the title - and later convert the title using this value to utf-8 , or use it to set the page encoding - depending on your tasks.

Getting the header in the appropriate encoding

1 answer 1

More articles: