There is a line:

<a href="/href/blabla">внедорожники 1</a> какие-то такие, но не всегда внедорожники 2 бывают внедорожники 3 <img alt="всегда свежие внедорожники 4" title="сила - это внедорожники 5" /><a href="/href/blabla">какие-то такие внедорожники 6 но не всегда но не всегдано не всегдано не всегдано не всегдано не всегдано не всегдано не всегдано не всегдано не всегда</a> 

Task: you need to get all the entries, but those that are not in the link and not in the attributes (alt and title). Those. need to get "SUV 2" and "SUV 3".

I have such a regular season:

 \#внедорожники(?!.{0,1000}<\/a>)(?!.{0,1000}\/>)(?!.{0,1000}>)\#i 

But it does not work correctly, because only the ending for /> or> is checked, but not for the beginning. If you put the number 1000, it will begin to search. But only because the closing tags are outside 1000 characters.

Question: how can I construct an expression to check for the presence of the opening tag?

Thank!

UPD: Dialect - PHP (pcre). {0,1000} - I did in order to debug this process. So to say, experiences :) For now the main thing for me is to work .. :)

  • >. {0,1000} - oops, easy. UPD: Now we think ... - Opalosolo
  • and what is the dialect of the regulars? js, pcre? - zb '
  • Any ideas? - Aydar

1 answer 1

Is it crucial to solve the problem with one regular expression? Or you can use several? In the latter case, you can proceed as follows

 <?php $text = strip_tags(preg_replace('/<[^>]+>[^<]+<[^>]+>/', '', $text)); preg_match_all('/внедорожники\s+\d+/is', $text, $out); echo '<pre>'; print_r($out); echo '</pre>';