Here is the parser code:
ini_set('error_reporting', E_ALL); ini_set('display_errors', 1); ini_set('display_startup_errors', 1); ini_set('max_execution_time', 900000); $ link = $ _POST ['page_link']; // page_link is the google results page $ i = 0; // starting number of the link to the page from the issue results
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$link); curl_setopt($ch, CURLOPT_USERAGENT, ""); curl_setopt($ch, CURLOPT_FAILONERROR, 1); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_REFERER, "http://www.google.ru/"); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_TIMEOUT, 30); curl_setopt($ch, CURLOPT_POST, 0); $data = curl_exec($ch); preg_match_all("/<cite>(.+?)<\/cite>/is",$data,$matches); $result = $matches[1]; $resultLength = count($result); for ($i; $i < $resultLength; $i++) { $str_out = strip_tags($result[$i]); $str = file_get_contents($str_out, false); preg_match_all('#(.+?)\@([a-z0-9-_]+)\.(ru|net|com|ua|in|by|tv|pl|biz)#i',$str,$matches); $urls[] = $matches[0]; $urls_result = implode("",$urls[0]); echo $str_out."<br>"; echo $urls_result."<br>"; } If a
$link = "https://www.google.com.ua/search?q=odessa+web+studio+contact&oq=odessa+web+studio+contact&gs_l=psy-ab.3..33i160k1.721.721.0.1337.1.1.0.0.0.0.108.108.0j1.1.0....0...1.1.64.psy-ab..0.1.107....0.hXfl1TDkaHc"; Here is the result:
https://skylogic.com.ua/contacts.html sup@skylogic.com https://lynx.od.ua/contacts/ sup@skylogic.com https://www.trendline.in.ua/ sup@skylogic.com https://sozdat-sayt.com.ua/contact/ sup@skylogic.com and so on.
Question: why the sparse email is repeated, if a separate email should be parsed on each separate page. If you change the number of the start page with which the parsing should start, the e-mail is changed, to the one that was sent from another page, but duplicated by analogy.