Regular expression. Multiple matches and anchors

Question

There is a line of the following form:

abcd [abc Str1 def Str2 ghi Str3 ] efg Str4 [Str5 abc Str6 d] hi

From it it is necessary to select strings of the form Str\d+ . But not all, but only those that are inside the "anchors". In this case, the characters [ (left) and ] (right) act as "anchors". That is, the regular expression must find the strings.

 Str1, Str2, Str3, Str5, Str6

but skip Str4 . Can this be done with regular expressions? And if so, how?

If you understand how to solve the problem with the help of my answer on the second link, then I just close the question as a duplicate.
If it is not clear, write me a comment: I will adapt the answer for your case and close it as a duplicate.
@ReinRaus The first is resolved without RegEx, the second is something close.
Although Viktor Stribizhev's answer in this question may also suit you, if the text inside the "anchors" should not be treated as a single entity.

Community spirit ♦ one · Accepted Answer · 2016-02-06T19:03:27

The easiest way to solve this problem is to use the "cascade descent" method.
Its essence is incredibly simple:
The first regular expression we isolate a large piece of text. In this case, it is any text from [ to ] . The first regular expression will be:

 /\[[^\]]*\]/s

that is, any text inside [...]

The second regular expression is applied to the found text, which will find the desired result. In this case it is

 /\bStr\d+\b/

Slightly changed the expression from the question, adding a word boundary, because I think this is the right move.

The final code is:

 $text = "abcd [abc Str1 def Str2 ghi Str3 ] efg Str4 [Str5 abc Str6 d] hi"; $re1 = "/\\[[^\\]]*\\]/s"; $re2 = "/\\bStr\\d+\\b/"; preg_match_all( $re1, $text, $arr1 ); foreach ( $arr1[0] as $k=>$v ) { preg_match_all( $re2, $v, $arr2 ); // обработка результата, например так: var_dump( $arr2[0] ); };

http://ideone.com/i1qqcr

There are other ways to solve the problem, but they are not so simple.
I highly recommend the "cascade descent" if your knowledge in regular expressions is weak.

Option 2.
The result is obtained with "detachment from the context", that is, you cannot process Str1-3 and Str5-6 as a single coherent whole:
https://ru.stackoverflow.com/a/448588/481

Option 3.
Result with callback. It uses replace, but we know that you can do anything in a callback, for example, to process the received data:
https://ru.stackoverflow.com/a/489561/481

Option 4.
Will not work in PHP, due to the use of unsupported functionality:

 (?<=\[[^\[\]]*\bStr\d+\b(?=[^\[\]]*\])

I have never met anywhere, so that viewing this back allowed (by the way, you have skipped the closing bracket).
If any closing "anchor" is required to be opening, then removing the look back can also get this result in php ( example )
CROW, while support is only in the languages of the .NET family, well, you can also compile PCRE2 itself.
kff, there is nothing difficult there, but I do not recommend using it in real code, because you will not be able to support such a code without having thoroughly studied regular expressions before this.
@kff And you turn on the / x mode and write a multi-line regular with comments like /(?: \[ | # открывающая скобка или (?!^) \G # точка останова последнего поиска НЕ в начале строки well, etc. then there is a chance to understand it even in a month :)
@BOPOH Well, this is minor, cosmetic editing solves the problem (?:\[|(?!^)\G)[^]]*?(\bStr\d+\b)(?=[^\[\]]*\])
@kff in the case of identical anchors for the practical result, you can not bother with complex regulars.
Python is written on your page, I will give an example on it for brevity: "|".join( text.split("|")[1::2] ) In the resulting text, you can simply search for Str\d+

Answer 2 · 2016-02-06T18:16:33

Something like this seems like: \[[\w\s]+(Str[\d]{1,})[\w\s]+\]

Stanislav

1.532 eight 12

Here so regex101.com/r/fW4tZ6/1 ? - Stanislav
one
Not. Only two matches. And you need five. - user194374
Yes, I also just noticed =) now we fix it - Stanislav
I apologize, one o'clock in the morning already ... I can not understand why he takes only the last Str ... in the morning I will look at my fresh head - Stanislav

|

Regular expression. Multiple matches and anchors

2 answers 2

More articles: