like using regular expressions, containing identical tags in the text, find tags with specific content

Question

As without parsing, only using regular expressions, to get from a set of identical tags, tags with specific content

XML:

<tag> ... </tag> <tag> ... content ... </tag> <tag> ... content ... </tag>

Result:

 <tag> ... content ... </tag> <tag> ... content ... </tag>

naive solution doesn't work:

 .*?<tag>.*?content.*?<\/tag>

an idea with a negative lookahead didn't work either:

 .*?<tag>.*?(?!<\/tag>).*?content.*?<\/tag>

Interested in: Is it possible to implement this on regex? if not, why?

similar task with single brackets place tag:

 (...)(..)(...ABC...)(..)(.,.ABC,.)

decision:

 \([^)]*ABC[^)]*\)

Regular expressions are designed to parse regular grammar.
That is why it is difficult to get only the necessary part of the tags.
However, some modern regex engines have long been able to capture so-called balanced groups (balancing groups), which allows parsing irregular texts.
And most importantly, it takes more time to study all this than to write code using a normal xml parser.
@ alexander-petrov Perhaps you meant not "irregular" a "context-free grammar"?
(according to the Chomsky hierarchy) ... and yes, you are right.
in my example, all groups are fairly balanced (all parentheses are neatly closed) ... and what, recursions and subroutines in regular expressions - is it a taboo or a bad tone?
I admit honestly: I don’t understand very well how lookahead works, but I suppose that sub-similar measurement can be solved by such means. Therefore, I asked this question here.

Serafim serafim 71 9 · Answer 1 · 2019-04-24T15:35:49

The solution in this case is:

 <tag>(?:[^<]|<(?!\/tag>))*content.*?<\/tag>

1 answer 1