preg_replace mail.ru redirect

Question

There is a text chock full of such links. It is necessary to somehow get a direct url from it and write it in place of the redirect. Please help with the regular schedule!

<a href="http://win.mail.ru/cgi-bin/link?check=1&amp;url=http%3A%2F%2Fredirect.subscribe.ru%2Flaw.russia.review.consdailyrus%2C2215%2F20110128105418%2Fn%2Fm17587277%2F-%2Fmy.consultant.ru%2Fcabinet%2F%3Fmode%3Dstat%3Bclick%3Bd%3D2011-01-27%3Br%3Dfd%3Bs%3Dsubscribe%3Bdst%3Dhttp%253A%252F%252Fwww.consultant.ru%252Flaw%252Freview%252Flink%252F%253Fid%253D961912">О федеральной целевой программе</a>

Added .

Thank! It was possible to read the final link without redirects by adding this:

 foreach($res[1] as $n => $link) if (!empty($res[2][$n])) { $url = urldecode($res[2][$n]); if ( (preg_match('/redirect.subscribe.ru/is', $url)) > 0 ) { parse_str(parse_url($url, PHP_URL_QUERY), $url); $url = $url['mode']; $l=strpos($url, 'dst='); $url = str_split($url,$l+4); $result = $url[1].$url[2]; } else { $result = $url; } $text = str_replace($link, $result, $text); }

Accepted Answer · 2011-07-12T16:26:51

 <? $text = '<a href="http://win.mail.ru/cgi-bin/link?check=1&amp;url=http%3A%2F%2Fredirect.subscribe.ru%2Flaw.russia.review.consdailyrus%2C2215%2F20110128105418%2Fn%2Fm17587277%2F-%2Fmy.consultant.ru%2Fcabinet%2F%3Fmode%3Dstat%3Bclick%3Bd%3D2011-01-27%3Br%3Dfd%3Bs%3Dsubscribe%3Bdst%3Dhttp%253A%252F%252Fwww.consultant.ru%252Flaw%252Freview%252Flink%252F%253Fid%253D961912">О федеральной целевой программе</a> <a href="http://win.mail.ru/cgi-bin/link?check=1&amp;url=http%3A%2F%2Fredirect.subscribe.ru%2Flaw.russia.review.consdailyrus%2C2215%2F20110128105418%2Fn%2Fm17587277%2F-%2Fmy.consultant.ru%2Fcabinet%2F%3Fmode%3Dstat%3Bclick%3Bd%3D2011-01-27%3Br%3Dfd%3Bs%3Dsubscribe%3Bdst%3Dhttp%253A%252F%252Fwww.consultant.ru%252Flaw%252Freview%252Flink%252F%253Fid%253D961912123">О федеральной целевой программе 123</a>'; echo '<h3>До</h3>'; echo '<hr />'; echo $text; echo '<hr />'; $exp = '/<a.*href="(http.*url=([^"]+))"[^>]*>/i'; $res = $res2 = array(); preg_match_all($exp, $text, $res); if (!empty($res[1])) foreach($res[1] as $n => $link) if (!empty($res[2][$n])) $text = str_replace($link, urldecode($res[2][$n]), $text); echo '<h3>После</h3>'; echo '<hr />'; echo $text; echo '<hr />'; ?>

The only thing is redirects in the link 2 (mail.ru + subscribe.ru) and the second is harder to catch. But before the first one, the code above processes.

Answer 2 · 2011-07-12T17:34:01

In my opinion, in this case it is easier to use xml parser .

 $xml = new SimpleXMLElement($text); // пробегаемся по всем ссылкам с атрибутом href foreach ($xml->xpath('//a[@href]') as $a) { // пропускаем ссылки без query if ( !($query = parse_url($a['href'], PHP_URL_QUERY)) ) { continue; } // парсим query, значения будут уже декодированны parse_str($query, $params); // меняем атрибут на соответсувующий параметр, если есть if (isset($params['url'])) { $a['href'] = $params['url']; } } var_dump($xml->asXML());

UPD. If you really want to replace it on a regular basis, it is best to use preg_replace_callback ()

 $result = preg_replace_callback('/(<a\b[^>]+href=)(\S+)\b([^>]*>)/i', function($m) { $url = htmlspecialchars_decode(trim($m[2], '"\'')); if ( !($query = parse_url($url, PHP_URL_QUERY)) ) { return $m[0]; } parse_str($query, $params); if (!isset($params['url'])) { return $m[0]; } $url = $params['url']; $parts = parse_url($url); if ($parts['host'] !== 'redirect.subscribe.ru') { return $m[1].$url.$m[3]; } parse_str($parts['query'], $params); if (!isset($params['mode']) || !preg_match('/;dst=(.+)/', $params['mode'], $n)) { return $m[1].$url.$m[3]; } return $m[1].$n[1].$m[3]; }, $text);

Most HTML documents are not XML documents, so XML parsers cannot parse anything.
In addition, you can always use [DOM] [1] [1]: ru.php.net/manual/en/domdocument.loadhtml.php
"Modern XML parsers can parse HTML" - in this case, they are not, but are HTML parsers.
Since the difference between HTML and XML is very significant.

Community spirit ♦ one · Answer 3 · 2011-07-12T15:17:42

Regular expressions can only search for the link itself. The link from the url-parameter must be decoded using urldecode .

Make the corresponding regular expression on the basis of ready-made ones. For example, from SO . Or based on many others .

preg_replace mail.ru redirect

3 answers 3

More articles: