How to isolate certain words between title tags?

Question

Sample text:

<title>Машины волосы мягкие и шелковистые потому что Маша пользуется Head & Shoulders</title> Маша уже 10 лет пользуется Head & Shoulders и она очень довольна. <!-- А это еще один title в конце документа --> <title>Машины волосы мягкие и шелковистые потому что Маша пользуется Head & Shoulders</title>

You need a regular expression that finds the words Machines and Masha between the title tags, while not performing a search in the rest of the text.

This is needed to further replace the words:
Машины → Олены
Маша → Оля

I use PHP (PCRE) .

That's what happened with me:

/(?<=<title>)(?:.*)(Машины|Маша)(?:.*)(?=<\/title>)/ui

, but this regular expression finds only the word Masha :

https://regex101.com/r/rN7pE3/2

I edited the question, the words Masha and Mashiny in the title would need to be replaced later, without touching the rest of the text.
A simple replacement of one substring for another without any regulars - not?
@andreymal How do you replace one substring with another given that the replacement should be made exactly between the title tags?
Even so, you can find a way to limit the scope for replacement (at least I always do this in python, but it’s probably not very difficult with php) (although the code will be more bloated than regular, but not as scary as the worker itself regular season :)

yevgeniyche yevgeniyche 76 7 · Answer 1 · 2016-02-05T20:06:05

In general, nothing smarter could come up with, if someone has better options, write.

 <?php $text = preg_replace_callback('/<title>([\s\S]*)<\/title>/iuU', function ($matches) { return preg_replace( array( '/\bМашины\b/iu', '/\bМаша\b/iu' ), array( 'Олены', 'Оля' ), $matches[0]); }, $text); ?>

Answer 2 · 2016-02-05T23:36:32

 $text = preg_replace_callback('/(<title>)|(<\/title>)|(Маша)|(Машины)/ui', function( $match ) { static $title_state= 0; if($match[1]) { $title_state++; return $match[0]; } if($match[2]) { $title_state--; return $match[0]; } if($title_state<=0) return $match[0]; if($match[3]) return "Оля"; if($match[4]) return "Олины"; }, $text);

It's hard, to be honest, but if you replace 0 and 1 with increment and decrement, you can handle the recursive attachment <title><title></title>Маша</title>

Mike mike 38.7k one 25 62 · Answer 3 · 2016-02-06T20:19:05

Such a regular game can find all the masters inside all the titles at once:

 /(?<=<title>|(?!^)\G).*?\K(Машины|Маша)(?=(?:.(?!<title>))*?<\/title>)/gius

You can immediately make a replacement, because \K ensures that all characters before the word "Masha" will not be counted even in the 0th match (i.e. the match of the entire string)

Example

Sergiks Sergiks 28.2k 3 38 78 · Answer 4 · 2016-02-05T19:17:09

Usually in HTML the <title> is found only once. Therefore, it may look for the first appearance of the substring "<title>", and the closing tag using mb_stripos() ; select a substring between them, and already in the content of the <title> tag obtained in this way, search for the desired words with a regular expression, or with the same search for a substring.

in theory there can be several titles on the tags page, for example, one at the beginning, another at the end and a regular page I am going to apply not only for title, but also for a and so on.

How to isolate certain words between title tags?

4 answers 4

More articles: