regular expressions - extract images and css files

Question

Hello!

Suppose there is a text:

<link rel="stylesheet" href="/css/style.css" type="text/css" /> <div class="custom"> <p><a href="/" title="IP-телефония"><img src="/images/logo2-2-fixed-mini-3.png" alt="IP-телефония" width="175" height="44" /></a></p> <p><a href="/news" title="IP-телефония"><img src="/logo2-2-fixed-mini.png" alt="IP-телефония" width="175" height="43" title="IP-телефония" /></a></p> <img src="/images/24.jpg"> </div>

You need to extract the path to the file with the name of the file itself. Files may have different extensions.

That is, the output should be the following:

href = "/ css / style.css"
src = "/ images / logo2-2-fixed-mini-3.png"
src = "/ logo2-2-fixed-mini.png"
src = "/ images / 24.jpg"

My regular work is crooked (captures everything until it finds the file extension).

 /(href|src)=".*(\.png|\.js|\.css|\.jpg)"/U

It turns out like this:

 href="/" title="IP-телефония"><img src="/images/logo2-2-fixed-mini-3.png"

Thanks in advance for your help!

Crantisz Crantisz 8.599 2 eight 38 · Accepted Answer · 2016-10-19T18:05:42

Here is the correct expression:

 /(href|src)=("|')([^"']*(\.png|\.js|\.css|\.jpg))("|')/

[^"'] - all characters except quotes

("|') - quotes may be different

plus brackets around the file name to pull out what you need

Community spirit ♦ one · Answer 2 · 2016-10-19T22:22:14

Do not use regular parsing for html parsing. This topic is already blurred to impossible.

 // Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>'; // Find all links foreach($html->find('a') as $element) echo $element->href . '<br>';

the first example of the library simplehtml http://simplehtmldom.sourceforge.net/

also in php there are functions for parsing http://php.net/manual/ru/domdocument.loadhtml.php if you go into more detail. And on stackoverflow.com there is such an answer: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags and it fieric mine

You can not parse [X] HTML regular. Because HTML cannot be parsed through them. Regular is not a tool that can properly split html into nodes. How I answered many similar questions. Regulars are simple tools for parsing strings, they are not suitable for html because html is not a language that uses expressions, but a markup language that has a more complex structure, so the regular expression cannot be used for parsing html. / * here I omit the translation / Every time when you try to parse the html regular, the child possessed by the devil weeps in the blood of the devs and the Russian hackers break your Web application. Parsing html with a regular expression is similar to calling dead souls into the realm of the living. / here the author describes how badly this code is supported * /

regular expressions - extract images and css files

2 answers 2

More articles: