When working with this XML document:

<?xml version="1.0" encoding="UTF-8"?> <response> <state>200</state> <error></error> <result> <user name="parent" title="NONE"> <roles> <item>parent</item> </roles> </user> </result> </response> 

I have some problems with parsing it. Here is how I pull information from it:

(I use the DocumentBuilderFactory-> DocumentBuilder-> Document classes for working with XML)

 ... DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dbuilder= dfactory.newDocumentBuilder(); byte[] bytes = xmltext.getBytes(); // xmltext - текст XML InputStream is = new ByteArrayInputStream(bytes); Document xmldoc = dbuilder.parse(is); xmldoc.getDocumentElement().normalize(); ... NodeList d = xmldoc.getDocumentElement().getElementsByTagName("roles"); String ut=d.item(0).getFirstChild().getNodeValue(); ... 

As a result, I get an empty string in the variable "ut". And in it should be the text "parent".

Why it happens? Thank.

Formatting elements (tabs, spaces, line breaks) are also the children of the node. In this case, the first descendant of node d is the newline character and a few indentation spaces. The next child is the item element itself. In order to avoid such a situation, it is possible during iteration through the tree to check whether the received element implements the Element interface — then this is a node.

  • Thank you very much! Did not know about such a bad thing in XML. It’s just interesting to whom it occurred to consider the space as a descendant of a node!? After all, it is useless. - AseN
  • Authors of the XML specification, probably. By the way, if my memory serves me, then you can set up the parser so that it skips all the whitespace characters. I do not remember exactly, maybe it depends on the specific implementation of the parser. In general, Google in your hands, go for it! - fori1ton