There was a problem. You need to parse the html and pull out all the words from it, here is a semi-working function:

preg_match_all("/<.+[^\/]>(.+[^<>])<\/.+>*/ix", $content, $var); 

But it does not take into account the space before the following <.+> , It also cannot process if the html is set like this:

 <div>First Text <span>Last text</span></div> 

Help to collect the right pattern.

    3 answers 3

     strip_tags($str) + preg_split('/[\W]+/', $str) 

    And the funny thing is that this solution is googled in 2 minutes.

    • Thanks for the answer. But the problem is that your version of the parsit and what is in the scripts. - Alex3327

    by your example

     <div>First Text <span>Last text</span></div> 

    decision

     PATH = 'div' div = g.cssselect(PATH)[0].text_content() 

    Output

    First Text Last text

       preg_match_all('/<\S+[^\/]>(.*?)<\/\S+>/ims', $html, $matches); 

      After strip_tags for elements of the resulting array.