The problem is that when I try to parse the xml document, I get an error:

lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding! 

Code:

 #-*- coding cp1251 -*- import sys from lxml import etree reload(sys) sys.setdefaultencoding("cp1251") inputFile = a.ED tree = etree.parse(inputFile) nodes = tree.xpath('/') print nodes.decode('cp1251') 

Windows 7, python 2.7, lxml 2.3

In the document:

 <ED101 sysCode ="04"> <dsig:SigValue xmlns:dsig="urn">AAAA</dsig:SigValue> <Name>Сергей Николаевич</Name> </ED101> 
  • one
    Specify the encoding of the xml-document or transcode its contents in utf-8. - Sergey Gornostaev
  • does not relate directly to the issue, but it is worth mentioning: 1- #-*- coding cp1251 -*- line has no effect in your Python code, since there are no non-ascii characters. 2- do not use reload(sys); sys.setdefaultencoding("cp1251") reload(sys); sys.setdefaultencoding("cp1251") is simply a way to spoil the data (without explicit errors that would indicate a problem) or to get output krakozaby . 3- .decode('cp1251') looks wrong. lxml should return the unicode type for non-ascii content. Just print unicode directly. - jfs
  • It is necessary either a colon or an equal sign after coding add, so that the line is perceived as an encoding declaration. Example: # -*- coding: utf-8 -*- (without a SyntaxError colon, a non-ascii source code will appear (in string constants, in comments). - jfs
  • @jfs Please post your comments as an answer. - Nicolas Chabanovsky
  • @NicolasChabanovsky my comments are not related to the problem with xml. To fix XMLSyntaxError, follow the recommendation of Sergey Gornostaev. - jfs

0