How to determine the nesting of an element inside another element

Question

It is necessary to determine whether the pre tag is inside the td tag and if there is something to do, if not, then do something with it. What is lower - does not want to work: (

$text = preg_replace_callback('#<pre.+?</pre(>)|(?:(?!<pre).[^<]*)+#s', 'pre_skip', $text); function pre_skip($m) { echo ' start '.$m[0].' end <br>'; if (preg_match('#<td><pre.+?</pre(>)|(?:(?!<pre).[^<]*)+<\/td>#s', $m[0])) return (isset($m[1]))? $m[0]: $m[0]; else return (isset($m[1]))? $m[0]: nl2br($m[0]); }

Accepted Answer · 2012-11-30T18:54:59

Can there be other pres inside?
Is pre always next to td or may there be other text between them?
If the answers are: no, yes, then the expression is:

preg_replace_callback("/(?<!\<td>)<pre>.*?<\/pre>(?!\<\/td>)/is", callback, $text);

If the answers are: no, no, then the expression is:

 $RE1="(?:\\s[^>]*)?"; ничего или атрибуты тэга preg_replace_callback("/(<td$RE1>(?:(?!\\<\/td>).)*?)(<pre$RE1>.*?<\/pre>)/is", callback, $text);

In group 1, everything will come from td to pre- this group should be returned to the callback. In the second group, the tag to be processed.

Promised answer:

 $text=<<<HEREDOC <td> <td></td> <pre> txt <pre> <td> text </td> </pre> </pre> [] </td> <pre> all okey </pre> <pre></pre> HEREDOC; $RE0="(?:\\s[^>]*)?"; // ничего или атрибуты тэга $RE1="(?P<PRE><pre$RE0>((?:(?!\\<pre$RE0>)(?!\\<\/pre\\s*>).)*+|(?P>PRE))+<\/pre\\s*>)"; $RE2=str_replace("PRE", "TD", str_replace("pre", "td", $RE1)); $text= preg_replace_callback("/$RE1|$RE2/is", "clb", $text); function clb($arr){ if ($arr["TD"]) return $arr["TD"]; // ничего не делаем. побочный результат return "!!!"; } echo htmlspecialchars($text);

Result:

 <td> <td></td> <pre> txt <pre> <td> text </td> </pre> </pre> [] </td> !!! !!!

@ReinRaus: well, between them there can be at least <! - comment ->
Can such an expression be executed quickly for a large incoming string?
In C ++, for large incoming data, exceptions to exit the stack were thrown.
It is better to use a special parser or write a parsing system.
@ReinRaus: Will your code handle this: *  <pre> *  * <div class="<td>"> <pre> ?
(DOM with such volumes on that technique is masochism. There was no thought about regulars)

Community spirit ♦ one · Answer 2 · 2012-11-30T18:08:42

There has just been another question similar to yours.

The fact is that correctly parsing HTML with regular expressions is impossible: (Unfortunately, this is so. Take better than any real HTML parser (I am sure that it is in PHP), and screw it. It will work!

By the way, as an accidental nonsense: is your problem solved with CSS?

Update:
It turns out that modern "regular expressions" have become stronger, and it is possible in principle to describe the HTML parser on them. But still not necessary, because

HTML is terribly complicated to handle correctly.
no, really creepy: your code will most likely not be able to distinguish the closing tag inside the comment from the actual closing tag, and what syntactic chaos you can create with ENTITY is better not to know with you
There are simple tools built into the language that allow you to do what you need, easier and more reliable.

I would just use XPath. You need, apparently,

 //td[count(ancestor::td)=0]

or

 //td[count(parent::td)=0]

Looks easier, doesn't it?

Basically, constructions of the form <td> <pre> (. *) </ Pre> </ td> or <pre> (. *) </ Pre>, therefore, you don’t really want to connect something more than a couple of regulars and function :( Can there be a way out?

How to determine the nesting of an element inside another element

2 answers 2

More articles: