The problem is that when I try to parse the xml document, I get an error:
lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding! Code:
#-*- coding cp1251 -*- import sys from lxml import etree reload(sys) sys.setdefaultencoding("cp1251") inputFile = a.ED tree = etree.parse(inputFile) nodes = tree.xpath('/') print nodes.decode('cp1251') Windows 7, python 2.7, lxml 2.3
In the document:
<ED101 sysCode ="04"> <dsig:SigValue xmlns:dsig="urn">AAAA</dsig:SigValue> <Name>Сергей Николаевич</Name> </ED101>
#-*- coding cp1251 -*-line has no effect in your Python code, since there are no non-ascii characters. 2- do not usereload(sys); sys.setdefaultencoding("cp1251")reload(sys); sys.setdefaultencoding("cp1251")is simply a way to spoil the data (without explicit errors that would indicate a problem) or to get output krakozaby . 3-.decode('cp1251')looks wrong. lxml should return theunicodetype for non-ascii content. Just printunicodedirectly. - jfscodingadd, so that the line is perceived as an encoding declaration. Example:# -*- coding: utf-8 -*-(without aSyntaxErrorcolon, a non-ascii source code will appear (in string constants, in comments). - jfs