Hello! I ask for your help. There is a line with html code that looks like this.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" lang="ru"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>***@conference.***.us - 25.06.2017</title> <!-- <link rel="stylesheet" type="text/css" href="../../../css/chatlogs.css"/> --> <style type="text/css"> <!-- ///.... </style> </head> <body> <div style="text-align: right;"> <a style="color: rgb(170, 170, 170); font-family: monospace;" href="../../../">Home</a> </div> <div class="roomtitle">tulpae-flood</div> <a class="roomjid" href="xmpp:***@conference.***.us?join">***@conference.***.us</a> <div class="logdate">25.06.2017<span class="w3c"><a class="nav" href="../../2017/06/24.html">&lt;</a> <a class="nav" href="./">^</a> <a class="nav" href="../../2017/06/26.html">&gt;</a></span> </div> <br/> ... <a name="00:00:00" href="#00:00:00" class="ts">[00:00:00]</a> <span class="mn">&lt;Nick&gt;</span> " Шаблон " <br> <a name="00:01:00" href="#00:01:00" class="ts">[00:01:00]</a> <span class="mn">&lt;Nic2&gt;</span> " Шаблон2 " <br> <a name="09:43:11" href="#09:43:11" class="ts">[09:43:11]</a> <span class="mn">&lt;Hankl&amp; Blr&gt;</span> О, Навка, Навка! <br/> 

I am trying to write a regular expression that would search for the word (n, m) (a, i) vka between <span> and <br> , capturing the <a> tags as well.

Such an attempt was not crowned with success:

 preg_match_all("/<a name=\"(.*?)\" href = \"(.*?)\" class=\"ts\"> (.*?) </a> <span class=\"mn\"> (.*?) </span> (Н|М)(а|я)вк.*?/i", $html, $search); 

I would be very grateful if someone responds and prompts!

  • Can I have examples of words to find? - Alex
  • 2
    @KateGurman please write an example in which there is the necessary text and the result that you want to see - Alex
  • one
    parsing html is best done with special parsers (for example, simplehtml ) - mymedia
  • one
    Here's the English: RegH match-ups self-contained tags - Nick Volynkin
  • one
    That's about .Net, but the essence of the problem is well explained there: How to parse HTML in .NET? - Nick Volynkin

1 answer 1

It seems to have found the answer, but if there are any mistakes, do not throw stones)))

This is a regular season:

#<a name((?!<\/span>).)*<\/span>((?!<br\/?>).)*(н|м|а|я)вк.*?<br\/?>$#mius

This is a piece of php code with the author's text template:

 <?php $html = '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" lang="ru"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>***@conference.***.us - 25.06.2017</title> <!-- <link rel="stylesheet" type="text/css" href="../../../css/chatlogs.css"/> --> <style type="text/css"> <!-- ///.... </style> </head> <body> <div style="text-align: right;"> <a style="color: rgb(170, 170, 170); font-family: monospace;" href="../../../">Home</a> </div> <div class="roomtitle">tulpae-flood</div> <a class="roomjid" href="xmpp:***@conference.***.us?join">***@conference.***.us</a> <div class="logdate">25.06.2017<span class="w3c"><a class="nav" href="../../2017/06/24.html">&lt;</a> <a class="nav" href="./">^</a> <a class="nav" href="../../2017/06/26.html">&gt;</a></span> </div> <br/> ... <a name="00:00:00" href="#00:00:00" class="ts">[00:00:00]</a> <span class="mn">&lt;Nick&gt;</span> " Шаблон " <br> <a name="00:01:00" href="#00:01:00" class="ts">[00:01:00]</a> <span class="mn">&lt;Nic2&gt;</span> " Шаблон2 " <br> <a name="09:43:11" href="#09:43:11" class="ts">[09:43:11]</a> <span class="mn">&lt;Hankl&amp; Blr&gt;</span> О, Навка, Навка! <br/>'; $reg = '#<a name((?!<\/span>).)*<\/span>((?!<br\/?>).)*(н|м|а|я)вк.*?<br\/?>$#mis'; preg_match_all($reg, $html, $matches); echo '<pre>'; echo $matches[0][0]; ?> 

And this output echo '<pre>'; echo $matches[0][0]; echo '<pre>'; echo $matches[0][0]; :

 <pre> <a name="09:43:11" href="#09:43:11" class="ts">[09:43:11]</a> <span class="mn">&lt;Hankl&amp; Blr&gt;</span> О, Навка, Навка! <br> </pre> 
  • works like a clock) Only you have extra shielding slashes in your template. It works with them too, but they are simply not needed in the template. - Edward
  • @ Edward will be grateful if you say in what places? - Raz Galstyan
  • @Eduard If you are about this \/ then I always screen them because often people write regulars like this /regexp/ . - Raz Galstyan
  • one
    Minus for parsing HTML regular. For this there are specialized parsers. - Nick Volynkin
  • 2
    @RazmikGalstyan a person does not need a regular, but a result. What she asks for regulars is a special case of an XY error , and your task is to explain how to parse HTML correctly. - Nick Volynkin