How to parse url's on one page and then go to each url and parse the div block

Question

There is a site with a .xml file, in which there is the necessary data.

You need to parse all the url, and then go through each url and parse the product description.

This is how I pull out all urls:

include 'simple_html_dom.php'; $products = simplexml_load_file('http://www.optnow.ru/opt.xml'); foreach($products->shop->offers->offer as $product) { $productUrls[] = $product->url; }

Now you need to go through each url'u. Tried to do so:

 foreach($productUrls as $productUrl) { $curl_handle = curl_init(); curl_setopt($curl_handle, CURLOPT_URL, $productUrl); curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2); curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1); $productDesc = curl_exec($curl_handle); curl_close($curl_handle); echo '<pre>'; var_dump(str_get_html($productDesc)->find('.description div div p')); echo '</pre>'; }

But a lot of incomprehensible objects appear at the output.

Something strange ... if simple_html_dom.php is used, and the page should be loaded through it, and not through Curl, this is most likely the problem
In this case, nothing will work for you, since str_get_html should work with the object that is created using the standard function, and not with what curl gives, so you can try parsing the curl itself.

How to parse url's on one page and then go to each url and parse the div block

0

More articles: