Java regular expression for html tags

Question

Help, please, I can not find the words in the string with the help of the regulars. I tried this:

Pattern d = Pattern.compile("(?<=(<th class=\"plainlist\")(.>)) ([\\s\\S]*)(?=<\\/th>)");

The line is this:

 <th class="plainlist" style="min-width:9em; background:#eaecf0; vertical-align:top; padding-left:.5em; padding-right:.5em;">Род деятельности</th><th class="plainlist" style="min-width:9em; background:#eaecf0; vertical-align:top; padding-left:.5em; padding- right:.5em;">Язык произведений</th> <td class="plainlist"> <span class="no-wikidata" data-wikidata-property- id="P1412">русский</span></td> </tr>

It takes just two words: activity

My regulars exclude tags, but not their insides in brackets. Please help me figure it out. What am I doing wrong? It is necessary that the regulars find only what is in the tags of th.

In the text in addition to the tag th, there are other tags.
Probably, I incorrectly described the task, but I need to find only what in the th tags and to isolate the text from them.
@YuriySPb, for dotnet found an answer with recommendations that can be used to close such questions as duplicates.
well, or throw a cry, because java experts are more than enough.

Let's say Pie Let's say Pie 3,590 one 9 41 · Accepted Answer · 2018-12-24T15:06:22

Try the following regular expression:

 '<th[^>]+\\>([^<]+)\\<\\/th>'

See an example using this expression .

 Pattern p = Pattern.compile("<th[^>]+\\>([^<]+)\\<\\/th>"); Matcher m = p.matcher(html); // html - ваша html-строка System.out.println(m.find() ? m.group(1) : "no match");

Java regular expression for html tags

1 answer 1

More articles: