There was a problem. You need to parse the html and pull out all the words from it, here is a semi-working function:
preg_match_all("/<.+[^\/]>(.+[^<>])<\/.+>*/ix", $content, $var);
But it does not take into account the space before the following <.+>
, It also cannot process if the html is set like this:
<div>First Text <span>Last text</span></div>
Help to collect the right pattern.