Regular expression with word exclusion

Question

I do the processing of addresses for the site.

There is a line (url) of the form:

site.ru/novosti/page-2/

There is a first expression: #^/novosti/page-([0-9]+)/# , which catches the URL and processes it.

But there is another priority rule that does not allow the first to work.

The second expression: #^/novosti/#

The task in theory is simple (as far as I understand this business) - to add an exception to the second expression so that the first one could quietly work out. Roughly speaking, if in the second expression after the last slash there are " page- " characters + any number up to a thousand, then such an expression should return false or simply not work.

In general, this bitrix handles, it turns out for the script.
I do not know how the bitrix router works, but can it be enough to change their sequence?

Accepted Answer · 2016-05-06T20:00:59

For these purposes in regular expressions there is such a thing as a statement .

There are two classes of statements:

statements back ( lookbehind ) impose restrictions on the text in front of them .
statements forward ( lookahead ) impose restrictions on the text after themselves .

Each statement can be both positive and negative. Each type of statement is recorded differently:

Positive statement back ( positive lookbehind ): (?<=foo)bar
Negative statement back ( negative lookbehind ): (?<!foo)bar
Positive statement forward ( positive lookahead ): foo(?=bar)
Negative statement forward ( negative lookahead ): foo(?!bar)

For example, a regular expression with a negative forward statement foo(?!bar) will match the string foo and the string bar ( foofoo , but not foobar ) does not foobar .

In your particular case, the regular expression might look like:

 #^/novosti/(?!page-[0-9]+).*$#

And here is the link to the working example on regex101.

If you do not need to capture the entire line, you can do with this expression:

 #^/novosti/(?!page-[0-9]+)#

@peter, is it for htaccess expression or can it be used in the script?
@peter if (preg_match("#^/novosti/(?!page-[0-9]+).*$#", "site.ru/novosti/page-2/")) { echo "true"; } else { echo "false"; }
if (preg_match("#^/novosti/(?!page-[0-9]+).*$#", "site.ru/novosti/page-2/")) { echo "true"; } else { echo "false"; }
if (preg_match("#^/novosti/(?!page-[0-9]+).*$#", "site.ru/novosti/page-2/")) { echo "true"; } else { echo "false"; } Just such a construction will return false.
@peter, so you also need it: " Roughly speaking, if in the second expression after the last slash there are characters <...>, then such an expression should return false "

Nikola Tesla Nikola Tesla 441 3 eight · Answer 2 · 2016-05-06T20:02:41

If the system supports browsing ahead, then the second regular schedule should be replaced by:

 #^/novosti/(?!page-[0-9]+)#

Regular expression with word exclusion

2 answers 2

More articles: