Text:

cat catter scat scatter nine-digit - nine - digit 

Regular expressions:

  1. \bcat\b - result ,
  2. \Bcat\B is the result

Everything, as in the description of meta characters: \b - matches the word boundary, \B - does not match.

However, if you use literal instead of cat :

  1. \b-\b - the result
  2. \B-\B - the result .

It turns out the opposite. The pattern of \b-\b are hyphens inside the word, and \B-\B are surrounded by spaces. Why is that?

2 answers 2

The thing is that the symbol - not part of the “word” in the terminology of PCRE.

The PHP documentation for the character class \w contains the following lines:

A character that forms a “word” is an arbitrary digit, letter, or underscore character, in other words, any character that may be part of the “word” in Perl.

And in the PCRE documentation there is such a definition of “word”:

A "word" character is a letter or digit.

    "nine-digit" is not one word, but two. Here, along their borders, clipping occurs.
    The symbol "-" does not refer to the characters that make up the words.