Got site page. It has the following line

<div class="about"> <h3>Алиса</h3> 

Need to get "Alice". How to do this with a regular expression?

Did the following: <div class="about"><h3>(.*)</h3>

Where:

(...) Grouping (group)

. Any character (dot character)

  • Zero and more times a

x Ignore spaces

\ s Space character (including tabulation)

But uvsoftium.ru does not want to determine ... Why?

Further actions I want to make through preg_match_all :

 preg_match_all('#<div class="about"><h3>(.*)</h3>#x', $content[$i], $matches[$i][1], PREG_PATTERN_ORDER); 

The result returns an empty array.

  • Not your first question about how to extract data from the page. use appropriate tools for this purpose, for example, phpQuery, but not regular expressions - teran
  • Describe in more detail. How do you imagine it? - doox911
  • @doox911 supplemented the answer - Kyper
  • missed a space in the expression <div class="about"> <h3> , - Bert
  • @Spartacus tried <div class = "about"> (\ s *) <h3> (. *) </ H3>. I still get an empty array and the service doesn't see it either - Kyper

1 answer 1

In your case, the problem is in the symbol / . It is official, and it must be escaped: \/ .

Also in the source line between <div class="about"> and <h3> there is a space, which is not taken into account in your regular expression. The whitespace can be different, so for the space of whitespace there is a special construct \s .

For your group, I would advise using a greedy search algorithm: <h3>(.*?)</h3> . Because in your case, in the source text of the form: <div class="about"> <h3>Алиса</h3> <h3>Лена</h3> </div> , it will be found: Алиса</h3> <h3>Лена .

Total expression: /<div class="about">\s*<h3>(.*?)<\/h3>/ .

Most likely I did not take into account all the problems. But I hope I answered your question.

  • / do not need to be escaped, since # - Grundy used as a limiter
  • But the site uvsoftium.ru uses / . The question arose because this expression does not work on this site. - Nikita A. Slutsky pm
  • In the question in the example, meanwhile # - Grundy