parsing cyrillic links php

Question

How to parse such links:

<?php echo file_get_contents('http://synonymonline.ru/П/прекрасный'); ?>

tried with urlencode and rawurlencode, iconv - nothing helps

Rogatnev Nikita 1,536 four 27 · Accepted Answer · 2017-06-30T04:38:25

You can not just take the link and skip it entirely through urlencode() or rawurlencode() . At the exit, you will no longer receive a link.

For your case, you need to select from the path reference, break it into components, drive them through the rawurlencode () function and put everything back together.

Example:

 $arr = parse_url('http://synonymonline.ru/П/прекрасный'); $link = $arr['scheme'] . '://' . $arr['host'] . implode('/', array_map('rawurlencode', explode('/', $arr['path']) ) ); echo file_get_contents($link);

PS If your domain will contain characters other than Latin, then it will have to be translated into IDNA ASCII format using idn_to_ascii ()

FlameStorm FlameStorm 159 one 6 · Answer 2 · 2018-12-28T23:48:38

In my case, the target link looked like

 http://some.domain.org//Uploads/images/408/А,Б%20секция%203%20этаж%204%20квcrop.jpg

those. a mixture of a bulldog with a rhinoceros - and the slashes are superfluous, and non-Latin, and the space in the form %20 .

The following solution helped, based on Andr'U Sender from the toaster ( as it happened, first answered there ), plus added a detailed working example on Cyrillic domains (and others national) on the @Visman tip:

Code tested for PHP 7.2:

 if (preg_match('#^([\w\d]+://)([^/]+)(.*)$#iu', $filenameSrc, $m)){ $filenameSrc = $m[1] . idn_to_ascii($m[2], IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46) . $m[3]; } $filenameSrc = urldecode($filenameSrc); $filenameSrc = rawurlencode($filenameSrc); $filenameSrc = str_replace(array('%3A','%2F'), array(':', '/'), $filenameSrc);

I note that if you try to use urlencode , and not rawurlencode , then it encodes spaces in the " + " pluses and did not want to open the link in this form. And with %20 how does rawurlencode do it rawurlencode

I hope here, too, someone will save a little hair :)

parsing cyrillic links php

2 answers 2

More articles: