I have a string - html code that needs to be parsed using regex regular expressions. I need to write to the vector std :: vector all the URLs on the page that are in href="" . My C ++ code does not work regularly.
#include <regex> #include <iostream> #include <string> using std::string; using std::regex; using std::cout; using std::endl; using std::sregex_iterator; using std::smatch; int main() { string subject("<head><title>Search engines</title></head><body><a href=\"https://yandex.ru\">Yandex</a><a href=\"https://google.com\"></a></body>"); try { regex re("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\""); sregex_iterator next(subject.begin(), subject.end(), re); sregex_iterator end; if (next == end) cout << "Oops" << endl; while (next != end) { smatch match = *next; cout << match.str() << endl; next++; } } catch (std::regex_error& e) { ; // Syntax error in the regular expression } return 0; } Only Python works.
#!/usr/bin/python3 import re html = '<head><title>Search engines</title></head><body><a href="https://yandex.ru">Yandex</a><a href="https:/google.com"></a></body>' title = re.findall(r'<title>(.*?)</title>', html)[0] links = [ x[1] for x in re.findall(r'<a\s+(?:[^>]*?\s+)?href=(["\'])(.*?)\1', html)] print (title) print (links) I guess that you can sit for a week, flipping through the reference book of Jeffrey Fridl in regular expressions and the regex library, and achieve the desired result, but stackoverflow is not intended for tips like "read Fridla, but do not ask to digest the porridge." In addition to this seemingly useful question, there is no answer on the stack to work.