Help, please, I can not find the words in the string with the help of the regulars. I tried this:

Pattern d = Pattern.compile("(?<=(<th class=\"plainlist\")(.>)) ([\\s\\S]*)(?=<\\/th>)"); 

The line is this:

 <th class="plainlist" style="min-width:9em; background:#eaecf0; vertical-align:top; padding-left:.5em; padding-right:.5em;">Род деятельности</th><th class="plainlist" style="min-width:9em; background:#eaecf0; vertical-align:top; padding-left:.5em; padding- right:.5em;">Язык произведений</th> <td class="plainlist"> <span class="no-wikidata" data-wikidata-property- id="P1412">русский</span></td> </tr> 

It takes just two words: activity

My regulars exclude tags, but not their insides in brackets. Please help me figure it out. What am I doing wrong? It is necessary that the regulars find only what is in the tags of th.

  • one
    (? <=> ) [^ <] + (? = <) - JavaJunior
  • In the text in addition to the tag th, there are other tags. Probably, I incorrectly described the task, but I need to find only what in the th tags and to isolate the text from them. Now I will alter the description. - Kira
  • one
    Then?
  • one
    Regulars are not suitable for parsing html . Use special libraries for this. - YurySPb
  • @YuriySPb, for dotnet found an answer with recommendations that can be used to close such questions as duplicates. but for java I don’t see a similar answer. need to write. well, or throw a cry, because java experts are more than enough. - aleksandr barakin

1 answer 1

Try the following regular expression:

 '<th[^>]+\\>([^<]+)\\<\\/th>' 

See an example using this expression .

 Pattern p = Pattern.compile("<th[^>]+\\>([^<]+)\\<\\/th>"); Matcher m = p.matcher(html); // html - ваша html-строка System.out.println(m.find() ? m.group(1) : "no match");