I have a string, parts of which are framed in tags. For example:

String str = "This text is not highlighted<hlTag>but this is</hlTag>" + "this isn't again<hlTag>and this is</hlTag>"; 

I need to parse it into parts - those inside the tags should be saved in some variables, and those outside the tags - in others. The result should be an array of such objects. It is important to consider the order of the parts in the original sentence.

It should be something like this: if highlighted, list.add (new HighlPart (text)), else list.add (new NonHighlPart (text)).

I wrote two regular expressions, for parts in tags and outside of them and they work:

 Matcher insideTagsMatcher = Pattern.compile(preTag + "(.+?)" + postTag).matcher(str); Matcher outsideTagsMatcher = Pattern.compile("^(.*?)" + preTag + "|" + postTag + "(.*?)" + preTag + "|" + "</hlTag>(.*?)$").matcher(str); 

However, I do not know how to take into account the order of the parts in the original sentence when parsing using such expressions. Please help.

  • str.split (preTag + "|" + postTag). But it only breaks the string into an array. Those. it is not clear where the highlited goes, and where not - I. Perevoz
  • It's clear. But I need to know where highlighted. - Oleg Shankovskyi

1 answer 1

Use Entry optional. You can simply create your own wrapper.

 final static String preTag = "hlTag"; final static String postTag = "/hlTag"; public static void main(String[] args) { String str = "This text is not highlighted<hlTag>but this is</hlTag>" + "this isn't again<hlTag>and this is</hlTag>"; String[] strings = str.split("<|>"); //if value true - in tag, else out tag List<Map.Entry<String,Boolean>> result = new ArrayList<>(); boolean inTag = str.startsWith("<"); for (String s: strings){ switch (s) { case preTag: inTag = true; break; case postTag: inTag = false; break; default: result.add(new AbstractMap.SimpleEntry<String, Boolean>(s, inTag)); break; } } } 

Result:

enter image description here