How to parse such links:

<?php echo file_get_contents('http://synonymonline.ru/П/прекрасный'); ?> 

tried with urlencode and rawurlencode, iconv - nothing helps

    2 answers 2

    You can not just take the link and skip it entirely through urlencode() or rawurlencode() . At the exit, you will no longer receive a link.

    For your case, you need to select from the path reference, break it into components, drive them through the rawurlencode () function and put everything back together.

    Example:

     $arr = parse_url('http://synonymonline.ru/П/прекрасный'); $link = $arr['scheme'] . '://' . $arr['host'] . implode('/', array_map('rawurlencode', explode('/', $arr['path']) ) ); echo file_get_contents($link); 

    PS If your domain will contain characters other than Latin, then it will have to be translated into IDNA ASCII format using idn_to_ascii ()

      In my case, the target link looked like

       http://some.domain.org//Uploads/images/408/А,Б%20секция%203%20этаж%204%20квcrop.jpg 

      those. a mixture of a bulldog with a rhinoceros - and the slashes are superfluous, and non-Latin, and the space in the form %20 .

      The following solution helped, based on Andr'U Sender from the toaster ( as it happened, first answered there ), plus added a detailed working example on Cyrillic domains (and others national) on the @Visman tip:

      Code tested for PHP 7.2:

       if (preg_match('#^([\w\d]+://)([^/]+)(.*)$#iu', $filenameSrc, $m)){ $filenameSrc = $m[1] . idn_to_ascii($m[2], IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46) . $m[3]; } $filenameSrc = urldecode($filenameSrc); $filenameSrc = rawurlencode($filenameSrc); $filenameSrc = str_replace(array('%3A','%2F'), array(':', '/'), $filenameSrc); 

      I note that if you try to use urlencode , and not rawurlencode , then it encodes spaces in the " + " pluses and did not want to open the link in this form. And with %20 how does rawurlencode do it rawurlencode

      I hope here, too, someone will save a little hair :)