Now I'm writing a parser html pages. To do this, I get the dom page and read all the links. In order to find out whether a link is a link to an article, I need to first remove all the tags in the a tag, along with the contents, and then get the text.
For this, I use regular expressions, most often come across tags such as div, span, b, i, p, strong. To clean them, I use 6 regular expressions.
$clean_title = preg_replace("'<span[^>]*?>.*?</span>'si","", $title); $clean_title = preg_replace("'<p[^>]*?>.*?</p>'si","", $clean_title); $clean_title = preg_replace("'<div[^>]*?>.*?</div>'si","", $clean_title); $clean_title = preg_replace("'<strong[^>]*?>.*?</strong>'si","", $clean_title); $clean_title = preg_replace("'<i[^>]*?>.*?</i>'si","", $clean_title); $clean_title = preg_replace("'<b[^>]*?>.*?</b>'si","", $clean_title); How can you combine them into one regular expression, instead of 6?