It is necessary to obtain from the variable the exact value of all <img> tags. There can be several dozen of them and all of them are of different types, somewhere just img src , and somewhere with a full set of attributes, somewhere there can be <img .... /> , and somewhere just <img ...> . I would like to do this without using regular expressions and third-party libraries. I tried this:

 $content=new DOMDocument(); $content->loadHTML($htmlcontent); $imgTags=$content->getElementsByTagName('img'); foreach($imgTags as $tag) { echo $tag->nodeValue; } 

But for some reason the result is not displayed. Although $tag->nodeName in this loop correctly displays the value of img . Please tell me where I have a mistake or how to solve the problem differently.

It is necessary to get exactly the exact integer value corresponding to <img ...> tag, and not one of its attributes. So that, depending on the conditions, cut a piece of this text from $ htmlcontent or leave it.

  • The fact is that the value is just empty - <img ... />. You need attributes. Dig to getAttributes for tags. - Yevgeny Borisov
  • Ah, got it. Is this a function for XML, does not understand single HTML unclosed tags? - federk
  • It's not about the function, but the secure.php.net/manual/en/class.domelement.php method . Public string getAttribute (string $ name) - Evgeny Borisov

3 answers 3

If you want to keep the structure, then the PHP Simple HTML DOM library may be useful. It doesn’t seem to change the structure. But, of course, it is better to test it yourself.

  <?php require_once('simple_html_dom.php'); $html = file_get_html('test.html'); foreach($html->find('img') as $element) { //выборка всех тегов img на странице echo $element->outertext()."\n"; } 
  • Thank you, this is very similar to what I need, except that this method formats the structure of the content of the tag, that is, if there was <img src="img"> (in this piece there is 5 spaces between img and src then not saved), the result will be <img src="img"/> . But the exact, unformatted value corresponding to the original is important to me. Is it really impossible to do this with the help of the Document Object Model, and you have to use regular expressions? - federk
  • Have to. Since DOMDocument is a parser that also performs normalization. - Yevgeny Borisov
  • one
    Added option - use a third-party library. - Yevgeny Borisov
  • Thank you for the advice, they helped me find the optimal solution, which was suggested in the English section. - federk
  • @federk So share. And close the question. - Evgeny Borisov

Duplicate question in the English section: https://stackoverflow.com/questions/36716701/is-it-possible-to-extract-full-accurate-image-tag-from-a-html-code-using-dom-in

They replied that it was impossible using DOM to get the exact unformatted tag value as I needed it. But it was suggested that you can use the removeChild method to remove the tag in this loop. That is, in my case:

 $content=new DOMDocument(); $xmlEncodding = '<?xml version="1.0" encoding="UTF-8"?>'; $content->loadHTML($xmlEncodding.$htmlcontent, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $imgTags=$content->getElementsByTagName('img'); foreach($imgTags as $tag) { if $tag->getAttribute("src") == "fordel") $tag->parentNode->removeChild($tag); } echo str_replace($xmlEncodding,"",$content->saveHTML()); 

    And what you do not like regulars? What about:

     preg_match_all('/<img[^>]+>/i', $html, $matches); foreach ($matches as $img) { echo $img; } 
    • The program already uses the DOM model to get the tag attributes, everything works fine - and I would like to insert into the existing loop one single check that requires the exact text value of the entire img tag. Regular expressions, as far as I know, are slower than the DOM, and for them, in fact, you have to write a separate handler. - federk
    • one
      @Kairat Jenishev <img src = " wow !>" Alt = "But you can!" > - Evgeny Borisov
    • Try to write more detailed answers. Explain what is the basis of your statement? - Nicolas Chabanovsky