Help please, I beg you .. The problem is that I get an array of links, if I follow the full news, but I don’t understand how to make the parser go, please explain a clever man ...

function parser_simple_html($url, $i){ if($i < 1) { $html = str_get_html(get_result($url)); $blog = $html->find("#dle-content", 0); foreach ($blog->find(".post") as $root) { $film = $root->find(".post-title", 0)->find('a', 0)->href; // $film - ΠΏΠΎΠ»ΡƒΡ‡ΠΈΠ»ΠΈ массив ссылок print $film; // ΠΊΠ°ΠΊ ΠΏΡ€ΠΎΠ±Π΅ΠΆΠ°Ρ‚ΡŒΡΡ ΠΏΠΎ эти ссылкам ?? } // Π½Π°Ρ…ΠΎΠ΄ΠΈΠΌ ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΡƒΡŽ страницу $page1 = $blog->find('.navigation', 0)->find('a', 10)->next_sibling ()->href; $page2 = $blog->find('.navigation', 0)->find('a', 9)->next_sibling ()->href; if ($page1 == true){ $page = $blog->find('.navigation', 0)->find('a', 10)->next_sibling ()->href; } else { $page = $blog->find('.navigation', 0)->find('a', 9)->next_sibling ()->href; } // ΠΊΠΎΠ½Π΅Ρ† поиска ΡΠ»Π΅Π΄ΡƒΡŽΡ‰Π΅ΠΉ страницы // ΠΏΠ΅Ρ€Π΅Ρ…ΠΎΠ΄ΠΈΠΌ Π½Π° ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΡƒΡŽ страницу if ((isset($page)) && !empty ($page)){ $i++; parser_simple_html('' . $page . '', $i); } } } $i = 0; parser_simple_html('http://Π΄Π»Π΅-сайт.Ρ€Ρƒ/page/1/', $i); 

New view of the program:

  // ----------------- Π’ΡΠΏΠΎΠΌΠΎΠ³Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ. // Выносим Π² Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ, Ρ‚Π°ΠΊ ΠΊΠ°ΠΊ ΠΌΠΎΠΆΠ΅Ρ‚ ΠΏΠΎΠΌΠ΅Π½ΡΡ‚ΡŒΡΡ способ получСния Ρ„Π°ΠΉΠ»Π°. function getHtmlDocument($url) { return file_get_html($url); } function getLinksFromDocument($htmlDoc) { // ΠΊΠΎΠ΄ ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π²Π΅Ρ€Π½Π΅Ρ‚ всС ссылки Π² массивС. // ΠŸΠΎΠΌΠ΅Π½ΡΠΉΡ‚Π΅ ваш ΠΊΠΎΠ΄ Ρ‡Ρ‚ΠΎ Π±Ρ‹ ΠΎΠ½ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ссылки Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π». $ssil = ''; $html = getHtmlDocument('http://dle-site.ru/page/1/'); $blog = $html->find("#dle-content", 0); foreach ($blog->find(".post") as $root) { $ssil .= $root->find(".post-title", 0)->find('a', 0)->href . ' '; } $s = $ssil; $ssilka = explode(" ", $s); return [$ssilka]; } function getArticleInfo($htmlDoc) { $tittle = $articlesInfo->find("#dle-content", 0)->find(".post", 0)->find(".post-title", 0)->find('a', 0)->plaintext; return [ "title" => $tittle, // Ρ‚ΡƒΡ‚ Π΄ΠΎΠΏΠΈΡˆΠΈΡ‚Π΅ сами "content" => $tittle // Ρ‚ΡƒΡ‚ Π΄ΠΎΠΏΠ΅ΡˆΠΈΡ‚Π΅ Ρ‚ΠΎΠΆΠ΅ сами ]; } // ----------------- Π’ΡΠΏΠΎΠΌΠΎΠ³Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ. // ----------------- Π‘Π°ΠΌΠ° ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ°. // ΠŸΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π΅ΡΡ‚ΡŒ список ссылок $htmlDocument = getHtmlDocument('http://dle-site.ru/page/1/'); // ΠŸΠ°Ρ€ΡΠΈΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ Ρ‡Ρ‚ΠΎ Π±Ρ‹ ΠΏΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ список ссылок Ρ‚ΠΎΠ»ΡŒΠΊΠΎ $linkList = getLinksFromDocument($htmlDocument); // ΠŸΡƒΡΡ‚ΠΎΠΉ массив с ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠ΅ΠΉ ΠΎ ΡΡ‚Π°Ρ‚ΡŒΡΡ… $articlesInfo = []; // Для ΠΊΠ°ΠΆΠ΄ΠΎΠΉ ссылки ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚. foreach ($linkList as $link) { $articleDocument = getHtmlDocument($link); // ΠŸΠ°Ρ€ΡΠΈΠΌ эти ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Π΅ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ‹. $articlesInfo[$link] = getArticleInfo($link); } // Π—Π΄Π΅ΡΡŒ пСрСмСнная $articlesInfo содСрТит всю ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ всСх ΡΡ‚Π°Ρ‚ΡŒΡΡ…. // ----------------- Π‘Π°ΠΌΠ° ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ°. 
  • What do you think about running? What do you want to do with the received links? - Dmitry Zasypkin
  • @DmitryZasypkin I want the parser to switch to each link from the array and take the article description or at least the title of the article from the post-title class - Kostya Korostelev
  • @E_p thanks, read, and not only this article ... but alas, there is no mentor who will tell you what I am doing wrong and why ... I read a book on php, with those examples that were given in a book like everything is clear and clear, but how I do something not by examples, so everything is rolling upside down ...) why I ask for help to explain what they are - Kostya Korostelev
  • @ KostyaKorostelev make the function of processing these links and set it on the resulting array - Dmitry Zasypkin

1 answer 1

If there is little programming experience, then you should not begin with copying the code, but with a description of what is happening.

The algorithm of your program is very simple:

  1. We receive the document in which there is a list of links
  2. Parsing a document to get a list of links only
  3. For each link we get a document.
  4. Parsim these received documents.

You can use this as a comment in the code. Meta program that does nothing. Then look and think. The 1st and 3rd paragraph you have the same. So you can write a general function.

Now you can start writing code.

 <?php // ----------------- Π’ΡΠΏΠΎΠΌΠΎΠ³Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ. // Выносим Π² Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ, Ρ‚Π°ΠΊ ΠΊΠ°ΠΊ ΠΌΠΎΠΆΠ΅Ρ‚ ΠΏΠΎΠΌΠ΅Π½ΡΡ‚ΡŒΡΡ способ получСния Ρ„Π°ΠΉΠ»Π°. function getHtmlDocument($url) { return file_get_html($url); } function getLinksFromDocument($htmlDoc) { // ΠΊΠΎΠ΄ ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π²Π΅Ρ€Π½Π΅Ρ‚ всС ссылки Π² массивС. // ΠŸΠΎΠΌΠ΅Π½ΡΠΉΡ‚Π΅ ваш ΠΊΠΎΠ΄ Ρ‡Ρ‚ΠΎ Π±Ρ‹ ΠΎΠ½ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ссылки Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π». return [] } function getArticleInfo($htmlDoc) { return [ "title" => "", // Ρ‚ΡƒΡ‚ Π΄ΠΎΠΏΠΈΡˆΠΈΡ‚Π΅ сами "content" => ""// Ρ‚ΡƒΡ‚ Π΄ΠΎΠΏΠ΅ΡˆΠΈΡ‚Π΅ Ρ‚ΠΎΠΆΠ΅ сами ]; } // ----------------- Π’ΡΠΏΠΎΠΌΠΎΠ³Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Π΅ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ. // ----------------- Π‘Π°ΠΌΠ° ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ°. // ΠŸΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π΅ΡΡ‚ΡŒ список ссылок $htmlDocument = getHtmlDocument('http://Π΄Π»Π΅-сайт.Ρ€Ρƒ/page/1/'); // ΠŸΠ°Ρ€ΡΠΈΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ Ρ‡Ρ‚ΠΎ Π±Ρ‹ ΠΏΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ список ссылок Ρ‚ΠΎΠ»ΡŒΠΊΠΎ $linkList = getLinksFromDocument($htmlDocument); // ΠŸΡƒΡΡ‚ΠΎΠΉ массив с ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠ΅ΠΉ ΠΎ ΡΡ‚Π°Ρ‚ΡŒΡΡ… $articlesInfo = []; // Для ΠΊΠ°ΠΆΠ΄ΠΎΠΉ ссылки ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚. foreach ($linkList as $link) { $articleDocument = getHtmlDocument($link); // ΠŸΠ°Ρ€ΡΠΈΠΌ эти ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Π΅ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ‹. $articlesInfo[$link] = getArticleInfo($link); } // Π—Π΄Π΅ΡΡŒ пСрСмСнная $articlesInfo содСрТит всю ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ всСх ΡΡ‚Π°Ρ‚ΡŒΡΡ…. // ----------------- Π‘Π°ΠΌΠ° ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ°. 
  • Thank you very much for your help)) they painted everything very well) - Kostya Korostelev
  • Good afternoon, an hour has passed since I racked my brains and didn’t understand the function function getArticleInfo($htmlDoc) how does it work?) What to add) how will it work in foreach ? after I gathered an array of links, errors went - Kostya Korostelev
  • @ KostyaKorostelev Show new code. - E_p
  • sorry for illiteracy, but I probably didn’t quite understand how to work with documents, how to deduce from the document the array that is obtained as a result of foreach - Kostya Korostelev