The php parser does not work correctly.

Question

In general, a simple, simple parser that works only at the top of the page. Google said that it is necessary to use $ start and $ finish, but if I prescribe them nothing at all parsitsya. The donor site has the following structure:

<div class="firm-list-item firm-place-1">~контент который нужно забрать~</div> <div class="firm-list-item firm-place-2">~контент который нужно забрать~</div> <div class="firm-list-item firm-place-2">~контент который нужно забрать~</div>

Between 1 and 2 divs the content is taken, and then there is no .. Here is the code of the parser itself:

 $title=file_get_contents($url); $start = '<div class="firm-list-item firm-place-2">'; $finish = '<div class="firm-list-item firm-place-3">'; $pos=strpos($title,'<a class="firm-item-title" href='); $title=substr($title,$pos); $pos1=strpos($title,'</a>'); $title=substr($title,0,$pos1); $title=preg_replace('<a class="firm-item-title" href="/firm/id/[0-9]+/">','',$title); echo $title; echo '<br>';

Tell me, please, what is the problem here (where to write $ start and $ finish?).

Accepted Answer · 2017-01-21T05:43:01

Use the PHPQuery library ( https://github.com/punkave/phpQuery ) will be easier.

 require ('phpQuery/phpQuery.php'); function get_content_by_url($url_target) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url_target); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_USERAGENT, 'Google Bot'); $data = curl_exec($ch); curl_close($ch); return $data; } $url_target = 'http://example.site.com/'; $html_content = get_content_by_url($url_target); $document = phpQuery::newDocument($html_content); $found_items = $document->find('div.firm-list-item[class^="firm-place-"]'); $print = ''; foreach($found_items as $key => $item) { $pq = pq($item); $content_text = pq($item)->text(); // только текст $content_html = pq($item)->html(); // весь html (контент) $print .= '<li class="my_item">'. $content_html .'</li>'; } $final_contentt = '<ul class="my_list">'. $print . '</ul>'; echo $final_contentt;

in theory should work.

well and for the future get_content_by_url replaceable with file_get_contents($url)

The php parser does not work correctly.

1 answer 1

More articles: