A regular expression that would search for the word (n, m) (a, z) vka between <span> and <br>, capturing also the tags <a>

Question

Hello! I ask for your help. There is a line with html code that looks like this.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" lang="ru"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>***@conference.***.us - 25.06.2017</title> <!-- <link rel="stylesheet" type="text/css" href="../../../css/chatlogs.css"/> --> <style type="text/css"> <!-- ///.... </style> </head> <body> <div style="text-align: right;"> <a style="color: rgb(170, 170, 170); font-family: monospace;" href="../../../">Home</a> </div> <div class="roomtitle">tulpae-flood</div> <a class="roomjid" href="xmpp:***@conference.***.us?join">***@conference.***.us</a> <div class="logdate">25.06.2017<span class="w3c"><a class="nav" href="../../2017/06/24.html">&lt;</a> <a class="nav" href="./">^</a> <a class="nav" href="../../2017/06/26.html">&gt;</a></span> </div> <br/> ... <a name="00:00:00" href="#00:00:00" class="ts">[00:00:00]</a> <span class="mn">&lt;Nick&gt;</span> " Шаблон " <br> <a name="00:01:00" href="#00:01:00" class="ts">[00:01:00]</a> <span class="mn">&lt;Nic2&gt;</span> " Шаблон2 " <br> <a name="09:43:11" href="#09:43:11" class="ts">[09:43:11]</a> <span class="mn">&lt;Hankl&amp; Blr&gt;</span> О, Навка, Навка! <br/>

I am trying to write a regular expression that would search for the word (n, m) (a, i) vka between <span> and <br> , capturing the <a> tags as well.

Such an attempt was not crowned with success:

 preg_match_all("/<a name=\"(.*?)\" href = \"(.*?)\" class=\"ts\"> (.*?) </a> <span class=\"mn\"> (.*?) </span> (Н|М)(а|я)вк.*?/i", $html, $search);

I would be very grateful if someone responds and prompts!

@KateGurman please write an example in which there is the necessary text and the result that you want to see
parsing html is best done with special parsers (for example, simplehtml )
That's about .Net, but the essence of the problem is well explained there: How to parse HTML in .NET?

Accepted Answer · 2017-06-27T20:39:35

It seems to have found the answer, but if there are any mistakes, do not throw stones)))

This is a regular season:

#<a name((?!<\/span>).)*<\/span>((?!<br\/?>).)*(н|м|а|я)вк.*?<br\/?>$#mius

This is a piece of php code with the author's text template:

 <?php $html = '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" lang="ru"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>***@conference.***.us - 25.06.2017</title> <!-- <link rel="stylesheet" type="text/css" href="../../../css/chatlogs.css"/> --> <style type="text/css"> <!-- ///.... </style> </head> <body> <div style="text-align: right;"> <a style="color: rgb(170, 170, 170); font-family: monospace;" href="../../../">Home</a> </div> <div class="roomtitle">tulpae-flood</div> <a class="roomjid" href="xmpp:***@conference.***.us?join">***@conference.***.us</a> <div class="logdate">25.06.2017<span class="w3c"><a class="nav" href="../../2017/06/24.html">&lt;</a> <a class="nav" href="./">^</a> <a class="nav" href="../../2017/06/26.html">&gt;</a></span> </div> <br/> ... <a name="00:00:00" href="#00:00:00" class="ts">[00:00:00]</a> <span class="mn">&lt;Nick&gt;</span> " Шаблон " <br> <a name="00:01:00" href="#00:01:00" class="ts">[00:01:00]</a> <span class="mn">&lt;Nic2&gt;</span> " Шаблон2 " <br> <a name="09:43:11" href="#09:43:11" class="ts">[09:43:11]</a> <span class="mn">&lt;Hankl&amp; Blr&gt;</span> О, Навка, Навка! <br/>'; $reg = '#<a name((?!<\/span>).)*<\/span>((?!<br\/?>).)*(н|м|а|я)вк.*?<br\/?>$#mis'; preg_match_all($reg, $html, $matches); echo '<pre>'; echo $matches[0][0]; ?>

And this output echo '<pre>'; echo $matches[0][0]; echo '<pre>'; echo $matches[0][0]; :

 <pre> <a name="09:43:11" href="#09:43:11" class="ts">[09:43:11]</a> <span class="mn">&lt;Hankl&amp; Blr&gt;</span> О, Навка, Навка! <br> </pre>

works like a clock) Only you have extra shielding slashes in your template.
It works with them too, but they are simply not needed in the template.
@Eduard If you are about this \/ then I always screen them because often people write regulars like this /regexp/ .
@RazmikGalstyan a person does not need a regular, but a result.
What she asks for regulars is a special case of an XY error , and your task is to explain how to parse HTML correctly.

A regular expression that would search for the word (n, m) (a, z) vka between <span> and <br>, capturing also the tags <a>

1 answer 1

More articles: