$content = file_get_contents($Url); preg_match_all('#<title>.+</title>#', $content, $matches); $title = preg_replace('#(<title>|</title>)#', '', $matches[0][0]); 

This way I get the title from the URL. The problem is that not all sites have the same encoding, from most sites it turns out to extract the title, but the rest do not, displays rhombuses with question marks, do not write to the database at all. I tried to translate the resulting string into UTF-8, but so far to no avail.

    3 answers 3

    1. Determine the encoding with the mb_detect_encoding () function.

    2. Convert, for example, to UTF-8 with the function mb_convert_encoding ()

    3. Then parse with the modifier u - utf-8.

    This modifier includes additional PCRE functionality that is not compatible with Perl: the template and the target string are treated as UTF-8 strings.

    preg_match_all('#<title>.+?</title>#isu', $content, $matches);

      First you need to decide in what encoding you will store pages.

      Then you need to determine the encoding of the downloaded page.

      If the encoding is different, call the iconv() function.

        I solved the problem as follows: Defined the page encoding by url

         preg_match_all('#charset=.+"#', $content, $array); $charset = preg_replace('#(charset=|")#', '', $array[0][0]); 

        But this code is not universal, because not all pages specify the encoding in this way. Then converted the string to the correct encoding.

         $newtitle = iconv($charset, "UTF-8", $title); 

        Of course, the code is not perfect, but the percentage of successful header extraction has increased significantly.