I need to parse an XML document in Python 3.xc using a function that accepts the XML file name and tag names to be left in place without touching them (that is, the entire contents of the tags passed as parameters should be left unchanged), the function returns a parsed string.

In general, I wrote a function, but the problem is that for some reason it pulls text from all tags, ignoring the tags listed as parameters.

Help, please, write the correct function, where is the jamb? I've been looking for a very long time, I just can't find it.

import xml.etree.ElementTree as ET import json import re def specific_tags_not_parse(self, name_file, *args): f = open(name_file, 'r', encoding="UTF-8") string = f.read() tags_list = [] for i in args: tags_list.append(i) tree = ET.parse(name_file) root = tree.getroot() my_list = [] se_t = [] # множество для всех открывающихся и закрывающихся одновременно тегов element = re.findall('<.*?[/]>', string) for x in element: se_t.append(x) for it in root.iter(): for it_tag in tags_list: if it.tag != it_tag: my_list.append(it.text) else: for a in se_t: if a.find(it.tag) != -1: my_list.append('<{} {}/>'.format(it.tag, dict_to_str(it.attrib))) else: my_list.append('<{} {}>'.format(it.tag, dict_to_str(it.attrib))) my_list.append(it.text) my_list.append('</{} {}>'.format(it.tag, dict_to_str(it.attrib))) my_list_2 = [] for e in my_list: if e != None: my_list_2.append(e) return ' '.join(my_list_2) 

Here is an example xml file with contents:

 <?xml version="1.0"?> <div> <math xmlns="http://www.w3.org/1998,897/Math/MathML" display="inline"> <semantics> <mrow> <mrow> <mfrac> <mn>6</mn> <mrow> <mn>3.17</mn> </mrow> </mfrac> <mo>&#xB7;</mo> <mfrac> <mrow> <mn>11.2</mn> </mrow> <mrow> <mn>25</mn> </mrow> </mfrac> <mo>&#xB7;</mo> <mfrac> <mrow> <mn>17</mn> </mrow> <mn>6</mn> </mfrac> <mo>.</mo> </mrow> </mrow> <annotation-xml encoding="MathML-Content"> <mrow> <mfrac> <mn>6</mn> <mrow> <mn>17</mn> </mrow> </mfrac> <mo>·</mo> <mfrac> <mrow> <mn>11</mn> </mrow> <mrow> <mn>25,564,98.9898</mn> </mrow> </mfrac> <mo>·</mo> <mfrac> <mrow> <mn>17</mn> </mrow> <mn>6.98798</mn> </mfrac> <mo>.</mo> </mrow> </annotation-xml> </semantics> </math> <csymbol cd="nums1">pi</csymbol> <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <msqrt> <mrow> <mi>a</mi> </mrow> </msqrt> <mo>&#xB7;</mo> <mroot> <mrow> <mi>b</mi> </mrow> <mrow> <mn>3</mn> </mrow> </mroot> </mrow> </math> </div> 

And I, for example, in main do like this a = XLML () print (a.specific_tags_not_parse ('note2.xml', 'semantics'))

It should work

  <semantics> <mrow> <mrow> <mfrac> <mn>6</mn> <mrow> <mn>3.17</mn> </mrow> </mfrac> <mo>&#xB7;</mo> <mfrac> <mrow> <mn>11.2</mn> </mrow> <mrow> <mn>25</mn> </mrow> </mfrac> <mo>&#xB7;</mo> <mfrac> <mrow> <mn>17</mn> </mrow> <mn>6</mn> </mfrac> <mo>.</mo> </mrow> </mrow> <annotation-xml encoding="MathML-Content"> <mrow> <mfrac> <mn>6</mn> <mrow> <mn>17</mn> </mrow> </mfrac> <mo>·</mo> <mfrac> <mrow> <mn>11</mn> </mrow> <mrow> <mn>25,564,98.9898</mn> </mrow> </mfrac> <mo>·</mo> <mfrac> <mrow> <mn>17</mn> </mrow> <mn>6.98798</mn> </mfrac> <mo>.</mo> </mrow> </annotation-xml> </semantics> pi a &#xB7; b 3 

(I took arbitrary indents here, the main thing is that we pull out the text from all other tags), that is, we removed all the text for all tags, except semantics, and left the semantics tag completely untouched with its contents. It is important that several tags may be submitted to the input, the contents of which must be left unchanged.

  • Please give an example XML file and an example of what you want to get at the output ... - MaxU
  • Added sample file to condition - user10119078
  • And the fact that in the "must get" are both xml, and plaintext so it should be? It looks weird, you usually need one - or pull out the desired tag, or pull the text out of the specified tag / tags - gil9red
  • the output should be a string, that is, just return ans_str (something like this) - user10119078

0