I am trying to parse the HTML document, and at the same time I understand regulars - I read articles, but as soon as it seemed that I understood everything - in fact it turns out to be wrong.

Specifically - there is a line: <meta property="og:image" content="http://drugoigorod.ru/wp-content/uploads/2016/06/krov_1.jpg" /><link rel="icon"

From it you need to pull out the 2016/06/krov_1.jpg .

Here is the regular schedule, which I got on the guides that I found and the syntax, as I understood it:

 private static Pattern headerImagePattern = Pattern.compile("<meta property=\"og:image\" content=\"http://drugoigorod.ru/wp-content/uploads/(.*)\\S/><link rel=\"icon\""); 

What is my mistake, and why he does not find the way I guess? What is the omission / subtlety.

ps I can’t send this template back, it is checked in tests, but for some reason the logs don’t plow there.

pss checkout sites like regex101 will not work, there is a difference in syntax, there is for javascript and php, what works there does not work in Java and vice versa.

Closed due to the fact that YuriySPb is off topic , aleksandr barakin , user194374, Denis , zRrr Jul 15 '16 at 9:47 .

It seems that this question does not correspond to the subject of the site. Those who voted to close it indicated the following reason:

  • “Questions asking for help with debugging (“ why does this code not work? ”) Should include the desired behavior, a specific problem or error, and a minimum code for playing it right in the question . Questions without an explicit description of the problem are useless for other visitors. See How to create minimal, self-sufficient and reproducible example . " - aleksandr barakin, Community Spirit, Denis
If the question can be reformulated according to the rules set out in the certificate , edit it .

  • Personally, it seems to me that it is easier for your task to wipe out information using the usual methods String - Vladyslav Matviienko
  • 2
    Never hear? Never parse xml with regulars. Here, for example, look at htmlparser.sourceforge.net here - rjhdby
  • one
    Well, this is not a very suitable task in order to understand regulars, as for me (I just don’t like regulars) - Vladyslav Matviienko
  • one
    @ YuriySPb is a great post, with a cool ending =) - Android Android
  • one
    Specifically, the problem here is in \S - it means "any character except whitespace (\ s)", and you just need a space. - zRrr

1 answer 1

 \d{4}/\d{2} 

\d - [0-9] Numeric character

{4} - 4 digits

further symbol /

\d - [0-9] Numeric character

{2} - 2 digits

  • Из неё нужно вытащить 2016/06/krov_1.jpg - exact match: D - Visman
  • \ d {4} / \ d {2} / \ S + .jpg - Senior Pomidor