How to parse site [closed]

Question

How to parse from the site Zagalovok, Picture, content?

Is the plague better to use which is the fastest? ...

Answer 1 · 2016-06-20T12:49:48

I personally used the PHP Simple HTML DOM Parser library. Pretty powerful tool.

A built-in DomDocument PHP library may be appropriate, but you must keep in mind that it will treat HTML as XML.

As a speed PHP Simple HTML DOM Parser eats a lot of memory, especially if the code is too cumbersome, but it is much cooler in terms of functionality and quality of parsing than a DomDocument. DomDocument can be used to parse those HTML documents that contain few errors or exceptions from XML.

I ’ll add code.google.com/archive/p/phpquery to the answer. I used this tool, rather successfully parsed more data.
Before parsing, I advise you to go through htmlpurifier.org Unfortunately, the incoming data is not always valid.
I read about simplehtmldom.sourceforge.net, even tried to install it for some reason I can't ...

Answer 2 · 2016-06-20T12:48:20

I remember the partners were too lazy to make an RSS feed for exchange rates, because of this I had to parse the html page, because the courses were updated every hour. I throw the link to the tool that I used to solve the problem.

http://simplehtmldom.sourceforge.net/

<?php include('simple_html_dom.php'); $html = file_get_html('http://google.kz'); foreach($html->find('что тебе надо найти') as $e){ $arr[] = trim($e->innertext); //можно не массивом } print_r($arr); //пример вывода массива ?>

Add @ before the parsa function, it will help to ignore errors on the html-page.
On the installation account, there is only 1 php page, it needs to be copied to the site folder, and then added first by include('simple_html_dom.php') .
Paste the file simple_html_dom.php into the same directory as index.php.

How to parse site [closed]

Closed due to the fact that it is necessary to reformulate the question so that it was possible to give an objectively correct answer by the participants Dmitriy Simushev , user194374, D-side , zRrr , Ipatiev Jun 20 '16 at 15:09 .

2 answers 2

More articles: