Here is the parser code:

ini_set('error_reporting', E_ALL); ini_set('display_errors', 1); ini_set('display_startup_errors', 1); ini_set('max_execution_time', 900000); 

$ link = $ _POST ['page_link']; // page_link is the google results page $ i = 0; // starting number of the link to the page from the issue results

 $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$link); curl_setopt($ch, CURLOPT_USERAGENT, ""); curl_setopt($ch, CURLOPT_FAILONERROR, 1); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_REFERER, "http://www.google.ru/"); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_TIMEOUT, 30); curl_setopt($ch, CURLOPT_POST, 0); $data = curl_exec($ch); preg_match_all("/<cite>(.+?)<\/cite>/is",$data,$matches); $result = $matches[1]; $resultLength = count($result); for ($i; $i < $resultLength; $i++) { $str_out = strip_tags($result[$i]); $str = file_get_contents($str_out, false); preg_match_all('#(.+?)\@([a-z0-9-_]+)\.(ru|net|com|ua|in|by|tv|pl|biz)#i',$str,$matches); $urls[] = $matches[0]; $urls_result = implode("",$urls[0]); echo $str_out."<br>"; echo $urls_result."<br>"; } 

If a

 $link = "https://www.google.com.ua/search?q=odessa+web+studio+contact&oq=odessa+web+studio+contact&gs_l=psy-ab.3..33i160k1.721.721.0.1337.1.1.0.0.0.0.108.108.0j1.1.0....0...1.1.64.psy-ab..0.1.107....0.hXfl1TDkaHc"; 

Here is the result:

 https://skylogic.com.ua/contacts.html sup@skylogic.com https://lynx.od.ua/contacts/ sup@skylogic.com https://www.trendline.in.ua/ sup@skylogic.com https://sozdat-sayt.com.ua/contact/ sup@skylogic.com 

and so on.

Question: why the sparse email is repeated, if a separate email should be parsed on each separate page. If you change the number of the start page with which the parsing should start, the e-mail is changed, to the one that was sent from another page, but duplicated by analogy.

  • Parsit Google is impossible (joke) - Farkhod Daniyarov
  • You use it purely for practice, and not in any way to send spam, right? - Farkhod Daniyarov
  • God forbid, only for the sake of curiosity) - PC Tea
  • then everything is good)) - Farkhod Daniyarov

1 answer 1

You seem to loop shove everything here $urls[] = $matches[0];
And then output only $urls_result = implode("",$urls[0]); First index $urls[0]
At the beginning of the loop, add $urls = []; and there will be happiness