I study the writing of the content parser, I stopped at the DiDOM library.
It turns out to parse the necessary information from one page:
require_once('vendor/autoload.php'); use DiDom\Document; $document = new Document('https://site.ru/catalog/tovar/', true); //Находим заголовок $main_heading = $document->find('.product-title h1')[0]; echo $main_heading->html(); //Находим цену $price = $document->find('.item_current_price')[0]; $price->text(); //Находим фото $foto = $document->find('.bx_bigimages_imgcontainer img')[0]; With one page, everything is clear. But I just can not understand the logic of crawling and receiving content from a large number of pages . How is this done in principle?
Does the parser have to find child pages by links in the product catalog ( for example ), or get links from their XML site maps, or else in some way upload the list of links and then follow them, finding there given information?
Please suggest an idea, please.