Good time .. Previously parsil only through regulars. Now I was told that you can do it differently. I would like to know if there are any advantages to parsing through DOM over regular expressions? Thank.

  • Thanks for the answer, but you can get info in Russian, because the translator does not make it clear what the idea is about there? )) or in a nutshell - Sarkis Allahverdian
  • He presented the main idea in the answer. In general, this is a canonical answer, which you will meet more than once, because those who want to parse html are regular. - Alexey Ukolov

1 answer 1

Arbitrary html is simply impossible to describe with regular expressions, it is a different type of grammar ( context-free versus regular ).

Therefore, you need to use the standard parsers - all the nuances are already taken into account, the code becomes readable and, most importantly, works as it should.

Regular expressions can be used only if you have a small piece of html with a clearly defined structure that never changes. But why?