The simplest regular expression successfully handles links in plain text:
r'(https?://[\S]+)' Everything suits him, but sometimes html comes in, where you need to isolate the link from the a tag. If you get something like some text <a href="http://ya.ru">some text , it will return as a result: http://ya.ru>some
Here is an expression:
r'(https?://[\S]+[>$])' returns an acceptable result (a link with the > symbol at the end, which can then simply be truncated), but no longer handles the links in plain text.
How in Python to combine these two expressions into one, according to the principle OR, to get all the matches, one by one?
Tried it through ()|() - it does not work that way. Third-party libraries do an excellent job with this task, but through the simplest regular expressions it is necessary to achieve the desired result.