Does it accurately save the first 7, not 10 photos? There is a filter 10, 20, 50 on the page. Perhaps your parser probably just doesnβt know how to click on the number 50 for a filter.
And if 7, then maybe it just turned out that it takes the first picture and goes to the next page. There, too, takes the first, etc. Accordingly, maybe you just did not consider something.
Why do I think so? Because url calmly parses itself. Pictures (and not only) get without problems.
Here is an example using php and Simple HTML DOM Parser (to use Simple HTML DOM Parser course, you need to download it ... resource )
// ΠΠΎΠ±Π°Π²Π»ΡΡΡ ΡΠΎΠΎΠ±ΡΠ΅Π½ΠΈΡ ΠΎΠ±ΠΎ Π²ΡΠ΅Ρ
ΠΎΡΠΈΠ±ΠΊΠ°Ρ
, ΠΊΡΠΎΠΌΠ΅ E_WARNING error_reporting(E_ALL & ~E_WARNING); include './domParser/simple_html_dom.php'; class DomParser { public $url = ''; public $imgHost = ''; public $returnVal = 0; public function __construct($urlParse, $imgHostUrl) { $this->url = $urlParse; $this->imgHost = $imgHostUrl; } public function getImages($_data) { $i = 1; $data = $_data ? $_data : file_get_html($this->url); if ($data->innertext != '') { foreach ($data->find('div.catalog__displayedItem') as $a) { foreach ($a->find('.catalog__displayedItem__columnFoto img') as $img) { echo '<img src="' . $this->imgHost . $img->src .'" />'; $imgExt = explode('.', $img->src); // ΠΡΠΎ Π΄Π»Ρ Π΄ΠΎΠ±Π°Π²Π»Π΅Π½ΠΈΡ ΠΊΠ°ΡΡΠΈΠ½ΠΊΠΈ ΡΠ΅Π±Π΅ Π² ΠΏΠ°ΠΏΠΊΡ // ΠΠ°ΠΊΠΎΠΌΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π» Π² ΡΠΈΠ΄Π΄Π»Π΅ if ($image = file_get_contents($this->imgHost . $img->src)) { //file_put_contents('./images/' . $i . '.' . end($imgExt), $image); } } $i++; } echo '<br /><br />'; $this->getNextPage($data, 'getImages'); $data->clear(); unset($data); } } public function getNextPage($data, $repeatFunctionName) { // ΡΠ΄Π΅Π»Π°Π» ΠΏΠΎΠΊΠ° ΡΡΠΎΠ±Ρ ΠΌΠ½ΠΎΠ³ΠΎ ΡΠΈΠΊΠ»ΠΎΠ² Π½Π΅ Π΄Π΅Π»Π°Π» Π½Π΅ Π½Π°Π³ΡΡΠΆΠ°Π» if ($this->returnVal >= 2) return; if ($data->innertext != '') { $this->returnVal++; foreach ($data->find('.catalogItemList__paginator a') as $a) { $str = iconv("windows-1251", "UTF-8", $a->title); if (mb_strpos(strtolower($str), 'Π»Π΅Π΄ΡΡΡΠΈΠ΅', 0, 'UTF-8') !== false) { $page = explode('?', $a->href); $data_inner_link = file_get_html($this->url . '?' . end($page)); $this->$repeatFunctionName($data_inner_link); break; } } } } } $url = 'http://www.onlinetrade.ru/catalogue/smartfoni-c13/'; $imgHost = 'http://www.onlinetrade.ru'; $parser = new DomParser($url, $imgHost); $parser->getImages(null); /* $url = 'http://www.onlinetrade.ru/catalogue/smartfoni-c13/'; $imgHost = 'http://www.onlinetrade.ru'; $data = file_get_html($url); $i = 1; function getImages($data) { global $imgHost; global $i; if ($data->innertext!='') { foreach($data->find('div.catalog__displayedItem') as $a) { foreach ($a->find('.catalog__displayedItem__columnFoto img') as $img) { echo '<img src="' . $imgHost . $img->src .'" />'; $imgExt = explode('.', $img->src); // ΠΡΠΎ Π΄Π»Ρ Π΄ΠΎΠ±Π°Π²Π»Π΅Π½ΠΈΡ ΠΊΠ°ΡΡΠΈΠ½ΠΊΠΈ ΡΠ΅Π±Π΅ Π² ΠΏΠ°ΠΏΠΊΡ // ΠΠ°ΠΊΠΎΠΌΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π» Π² ΡΠΈΠ΄Π΄Π»Π΅ if ($image = file_get_contents($imgHost . $img->src)) { file_put_contents('./images/' . $i . '.' . end($imgExt), $image); } } $i++; } echo '<br /><br />'; getNextPage($data); $data->clear(); unset($data); } } $return = 0; function getNextPage($data) { global $url; global $return; // ΡΠ΄Π΅Π»Π°Π» ΠΏΠΎΠΊΠ° ΡΡΠΎΠ±Ρ ΠΌΠ½ΠΎΠ³ΠΎ ΡΠΈΠΊΠ»ΠΎΠ² Π½Π΅ Π΄Π΅Π»Π°Π» Π½Π΅ Π½Π°Π³ΡΡΠΆΠ°Π» if ($return >= 2) return; if($data->innertext != ''){ $return++; foreach($data->find('.catalogItemList__paginator a') as $a){ $str = iconv("windows-1251", "UTF-8", $a->title); if (mb_strpos(strtolower($str), 'Π»Π΅Π΄ΡΡΡΠΈΠ΅', 0, 'UTF-8') !== false) { $page = explode('?', $a->href); $data_inner_link = file_get_html($url . '?' . end($page)); getImages($data_inner_link); break; } } } } //getImages($data); */
The use case of the class and the usual option through functions (commented out below)
You can touch it here.
At the moment, there is a special restriction on parsing only the first 3 pages (10 products), so as not to load the feedl and the line with file_put_contents because the file_put_contents does not skip that is logical)))
Here is the proof of conservation:
And there are many more down there ...
As an option, you can and probably better use (regarding php ) cURL
cURL is a free command-line utility that allows you to interact with many different servers across many different protocols with the URL syntax.
This code is simply more likely to show that everything works, you can download it and that you most likely have an error somewhere in the code.
Most likely, my answer is not the answer, but perhaps this code will be something useful and you will want to alter it somehow to your needs. Than save mht . Although the memory he will eat a lot.
10, 20, 50. Maybe your parser just can't possibly click on the number 50 for a filter? And if 7, then maybe it just turned out that it takes the first picture and goes to the next page. There, too, takes the first, etc. Maybe just something from this is not considered? Does your parser onjqueryreally save to a hard disk? About ______ About - Alexey Shimansky