Does the parser of invalid HTML in C ++ / QT exist in nature?

Requirements:

  • no dependencies, especially binary, tied to a specific architecture;
  • without binding to the UI and the app event loop (hello QWebEngine), so that you can use it, for example, in an Android application (QT Quick);
  • XPath and CSS Selectors, descendants / ancestor axis search (hello Gumbo)

Alternatives in other languages: Java: Jsoup, Python: Grab / BeautifulSoup

    2 answers 2

    It is not quite clear what “invalid HTML” is and how acceptable is its difference from the standard. Look, for example, http://xmlsoft.org/html/libxml-HTMLparser.html#htmlReadFile (with the HTML_PARSE_RECOVER attribute).

    • meaning that parsing it with standard QT tools, as usual xml will not work - norgen

    If you like HTML4, you can try libxml2
    http://xmlsoft.org/
    But for android, you will need to collect iconv, I still have vague memories that certain dances with a tambourine will be needed there.
    http://xmlsoft.org/FAQ.html#Compilatio
    of dependencies it requires libz, iconv
    MIT license