I make a parser for PHP which should copy all publications from the site and display this information on my site (this is not content theft, I agreed with the site owner)!

I have already written code that copies the list of publications on the main page (title, photo and short text), now I need to parse the contents of each publication, for this I began to parse links to all publications (on the main page of the site).

Now I need to write a function that will parse the contents of each publication on these links.

Please show with an example how to parse the text that is inside each link!

<?php header('Content-type: text/html; charset=utf-8'); require 'phpQuery.php'; function print_arr($arr){ echo '<pre>' . print_r($arr, true) . '</pre>'; } $url = 'http://lifemomentt.blogspot.com/'; $file = file_get_contents($url); $doc = phpQuery::newDocument($file); foreach($doc->find('.blog-posts .post-outer .post') as $article){ $article = pq($article); $text = $article->find('.entry-title a')->html(); //парсинг Π·Π°Π³ΠΎΠ»ΠΎΠ²ΠΊΠΎΠ² Π½Π° всС ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΈ print_arr($text); $texturl = $article->find('.entry-title a')->attr('href'); //парсинг ссылок Π½Π° всС ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΈ echo $texturl; } ?> 
  • and what is the problem to do by analogy with the list? - teran
  • I have links to all publications, I don’t know how to parse the contents of these links (the content I have in mind is the information that is when we click on this link) - R. P.

1 answer 1

You do everything just create a function that will accept the URL of the entry and inside the function you are already running the parser, added your example for clarity:

 <?php header('Content-type: text/html; charset=utf-8'); require 'phpQuery.php'; //Π€ΡƒΠ½ΠΊΡ†ΠΈΡŽ ΠΌΠΎΠΆΠ½ΠΎ вынСсти с Ρ„Π°ΠΉΠ»Π° ΠΏΡ€ΠΈ ΠΆΠ΅Π»Π°Π½ΠΈΠΈ function parseArticle($url){ $file = file_get_contents($url); $doc = phpQuery::newDocument($file); //Π’ΡƒΡ‚ парситС Ρ‚Π°ΠΊ ΠΆΠ΅ ΠΊΠ°ΠΊ ΠΈ список } function print_arr($arr){ echo '<pre>' . print_r($arr, true) . '</pre>'; } $url = 'http://lifemomentt.blogspot.com/'; $file = file_get_contents($url); $doc = phpQuery::newDocument($file); foreach($doc->find('.blog-posts .post-outer .post') as $article){ $article = pq($article); $text = $article->find('.entry-title a')->html(); //парсинг Π·Π°Π³ΠΎΠ»ΠΎΠ²ΠΊΠΎΠ² Π½Π° всС ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΈ print_arr($text); $texturl = $article->find('.entry-title a')->attr('href'); //парсинг ссылок Π½Π° всС ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΈ parseArticle($texturl); } ?> 
  • Thank you for your help. I somehow fail. Please show it on the example of this site lifemomentt.blogspot.com I would be very grateful! - R. R.