I have an HTML document:

<span xml:lang="en" lang="en"><b><span>Test Test </span></b></span><span>Test</span><span>Test</span> 

You need to display the contents of the <span></span> tags so that the result is as follows:

 <span xml:lang="en" lang="en"><b><span>Test Test </span></b></span> <span>Test</span> <span>Test</span> 

As I understand it, it is best to solve such a problem through regular expressions, but I only met them and on my own I could not write an expression for this task. Complete condition of the problem:

The first parameter in the main method comes tag. For example, "span". Output to the console all tags that correspond to a given tag. Each tag is on a new line, the order must correspond to the order in the file. The number of spaces, \ n, \ r do not affect the result. there is a separate closing tag, no single tags. The tag may contain nested tags.

  • 2
    The best way to solve this problem is through the HTML parsing libraries. jsoup.org - Vartlok
  • one
    If you are dealing with any bulk HTML or XML, it is better to use the corresponding parser. Regular expressions in this case are quite a crutch approach. For HTML parsing, there is a JSOUP library, for example. - Regent
  • I agree with Vartlok, but it depends on the task. Do you need a regular program, or the whole program? - LEQADA
  • Added a full description of the conditions of the problem. - Evgeniy
  • I'm still behind the parser. A regexp will have to take into account any line breaks, etc. and it will be poorly readable, and it will be simply impossible to make changes to it, especially if you do not know them well. No wonder "I know, I'll use regular expressions." Now they have two problems " - Vartlok

1 answer 1

The best way to solve this problem is through the HTML parsing libraries. For example jsoup.org .

A regexp will have to take into account any line breaks, etc. and it will be poorly readable, and it will be simply impossible to make changes to it, especially if you do not know them well.

I know, I'll use regular expressions. Now they have two problems

Similar answer in English SO .

Another answer from English is SO, why parsing HTML RegEpx is a bad idea .