There is a page with a block (for example, <div id="info">Информация</div> ). How to use PHP to pull out this block?

  • @ n130, clarify your question. - Niki-Timofe 5:56 pm
  • 2
    You can pull out all the site code (PHP: file_get_contents ()) and then parse the received code on your site using javascript or jQuery. - AseN

4 answers 4

Libraries for parsing html using PHP.

Simple HTML Dom parser

phpQuery

Although if you have one small task, you should look for easier ways, and not use large libraries for one task.

UPD. Googling, found the topic on stackoverflow.com . They offered a working expression, as one would expect - a very complicated one. I will cite it without comment (click on the link, everything is chewed there).

 <div\b[^>]*?\bid\s*+=\s*+([\'"]?+)\bcontent\b(?(1)\1)[^>]*+>((?:[^<]++(?:<(?!/?div\b|!--)[^<]*+)*+|<!--.*?-->|<div\b[^>]*+>(?2)</div\s*>)*+)</div\s*> 

Expression correctly rob div # content with all entrails. (Again, the original topic describes exceptional cases).

In general, if the structure of the necessary block is known, then you can use the @Barton solution, if not - it is easier to use the library, the code will be clearer)

    You can do this:

     $href = 'http://site.name'; $page = file_get_contents($href); preg_match("/<div.*id=\"info\".*>(.*)<\/div>/",$page,$match); print_r($match); 
    • <div class = "wrapper"> <div id = "info"> inner text </ div> </ div> It seems to me or your code will pull out the block together with the closing tag of the parent? - gridsane
    • @gridsane just checked, pulls out what's needed. $page = '<div class="wrapper"><div id="info">Some text</div></div>'; preg_match("/<div.*id=\"info\".*>(.*)<\/div>/",$page,$match); print_r($match); - Barton
    • This is if the default is non-greedy search in (. *). And if in #info there are nested divs? In my opinion yet. eliminates line breaks. - gridsane
    • in this case there was no talk of any internal blocks. When the vehicle asks, then we will solve the problem - Barton
    • As well as there was no speech about the fact that they are not) Good luck to you, with such solutions to the problems) - gridsane
     $page = 'http://google.com'; $pos = strpos($page, '<div class="Нужен этот блок">'); $page = substr($page, $pos); $pos = strpos($page, '<div class="Этот блок стоит сл, после того что нужен, его и все что после нужно также удалить">'); $page = substr($page, 0, $pos); 

    If you need to get only the insides, then using this code + a little js, it will be unclear how, I can chew.

      There are many ways, here's the article , very entertaining.