Check whether the word is included in the link, and whether it is in the attributes

Question

There is a line:

<a href="/href/blabla">внедорожники 1</a> какие-то такие, но не всегда внедорожники 2 бывают внедорожники 3 <img alt="всегда свежие внедорожники 4" title="сила - это внедорожники 5" /><a href="/href/blabla">какие-то такие внедорожники 6 но не всегда но не всегдано не всегдано не всегдано не всегдано не всегдано не всегдано не всегдано не всегдано не всегда</a>

Task: you need to get all the entries, but those that are not in the link and not in the attributes (alt and title). Those. need to get "SUV 2" and "SUV 3".

I have such a regular season:

 \#внедорожники(?!.{0,1000}<\/a>)(?!.{0,1000}\/>)(?!.{0,1000}>)\#i

But it does not work correctly, because only the ending for /> or> is checked, but not for the beginning. If you put the number 1000, it will begin to search. But only because the closing tags are outside 1000 characters.

Question: how can I construct an expression to check for the presence of the opening tag?

Thank!

UPD: Dialect - PHP (pcre). {0,1000} - I did in order to debug this process. So to say, experiences :) For now the main thing for me is to work .. :)

cheops cheops 18.1k 9 32 120 · Answer 1 · 2016-05-08T06:41:04

Is it crucial to solve the problem with one regular expression? Or you can use several? In the latter case, you can proceed as follows

 <?php $text = strip_tags(preg_replace('/<[^>]+>[^<]+<[^>]+>/', '', $text)); preg_match_all('/внедорожники\s+\d+/is', $text, $out); echo '<pre>'; print_r($out); echo '</pre>';

Check whether the word is included in the link, and whether it is in the attributes

1 answer 1

More articles: