$text = 'Тестик 1.test.ru 2.<a href="http://www.test.ru">test.ru ok</a> 3.https://test.ru'; $patt = array( '%\b(?<!href=[\'"])https?://([^\s\[\]<]+)(?![^<>]*</a>)%i', '%\b(?<!http://)(?<!https://)[az\d]+\.(ru|com|net)(?!["\'][^<>]*>)(?![^<>]*</a>)%i' ); $repl = array( '<a href="$0">$1</a>', '<a href="http://$0">$0</a>' ); $text = preg_replace($patt, $repl, $text); echo $text;
Here:
\b
- word boundary, it is needed here to capture the entire word (for example, test
), and not part of it ( est
);
(?<!href=[\'"])
- excludes link capture from tag A (for example, from here <a href="http://test.ru">
);
(?<!http://)(?<!https://)
- excludes the capture of links from those processed by the first regular schedule;
(?![^<>]*</a>)
- excludes link capture from tag A (for example, from here test.ru ok</a>
);
(?!["\'][^<>]*>)
- excludes link capturing from the A tag (for example, from here <a href="http://www.test.ru">
).
PS This decision still does not take into account all the nuances;)
UPD Option with more complex regulars, close to RFC 1738
$text = 'Тестик 1. abc.test.ru 2. <a href="http://www.test.ru">http://test.ru ok</a> 3. https://test.ru/search?search_id=975080714'; $patt = array( '%\b(?<!href=[\'"])(?>https?://|www\.)([\p{L}\p{N}]+[\p{L}\p{N}\-]*\.(?:[\p{L}\p{N}\-]+\.)*[\p{L}\p{N}]{2,})(?::\d+)?(?:(?:(?:/[\p{L}\p{N}$_\.\+!\*\'\(\),\%;:@&=-]+)+|/)(?:\?[\p{L}\p{N}$_\.\+!\*\'\(\),\%;:@&=-]+)?(?:#[^\s\<\>]+)?)?(?![^<]*+</a>)%u', '%\b(?<!http://)(?<!https://)([\p{L}\p{N}]+[\p{L}\p{N}\-]*\.(?:[\p{L}\p{N}\-]+\.)*(?:ru|com|net))(?::\d+)?(?:(?:(?:/[\p{L}\p{N}$_\.\+!\*\'\(\),\%;:@&=-]+)+|/)(?:\?[\p{L}\p{N}$_\.\+!\*\'\(\),\%;:@&=-]+)?(?:#[^\s\<\>]+)?|\b)(?![^<]*+</a>)%u' ); $repl = array( '<a href="$0">$1</a>', '<a href="http://$0">$1</a>' ); $text = preg_replace($patt, $repl, $text); echo $text;
target="_blank"
better not to use anywhere at all. This is a vulnerability for your site;) - Visman