Suppose there is an xml file

<data> <items> <item name="item1"></item> <item name="item2"></item> <item name="item3"></item> <item name="item4"></item> </items> </data> AAA BBB CCC 

Please tell me how you can remove all that after </data> with regular expressions, I mean AAA BB ...

Closed due to the fact that the essence of the question is incomprehensible by the participants of jfs , Alexey Shimansky , Bald , Denis Bubnov , user194374 21 Dec '16 at 10:58 .

Try to write more detailed questions. To get an answer, explain what exactly you see the problem, how to reproduce it, what you want to get as a result, etc. Give an example that clearly demonstrates the problem. If the question can be reformulated according to the rules set out in the certificate , edit it .

    1 answer 1

    This problem can be solved without regular ones:

     text = """ <data> <items> <item name="item1"></item> <item name="item2"></item> <item name="item3"></item> <item name="item4"></item> </items> </data> AAA BBB CCC """ try: i = text.rindex('</data>') text = text[: i + len('</data>')] except ValueError: pass 

    But if you really want, then:

     import re text = re.sub(r'(</data>).+', r'\1', text, flags=re.DOTALL) print(text) 
    • Thank! it works, if you can tell me, it comes in a string, how can you remove the character b b'<?xml version="1.0" through re - Mike Yusko
    • b' ? It looks like you are typing in the console not a string, but a byte representation of the string. If you decode it, get the string: b'<?xml version="'.decode() decode can be sent to decode , utf-8 used by default. But ... if through re: re.sub(r'^b(.+)', r'\1', '''b'<?xml version="1.0"''') . In general, using re to remove the first character is very powerful, because you can simply: '''b'<?xml version="1.0"'''[1:] i.e. return the entire string after the first character - gil9red
    • The @ gil9red xml encoding may be different (not utf-8), therefore .decode() (without an argument using sys.getdefaultencoding () == utf-8) may cause cracks (due to using incompatible encoding) ¶ If right inside the author sees b'..' (but not in the output), then something is broken upstream (this should be corrected). decode () does not help in this case. In the worst case, ast.literal_eval () may be needed if the input cannot be changed above. - jfs
    • Ok, here comes this at the beginning, it is already in the console b 'I check which type, displays str. I try to parse etree.fromstring (xml), it knocks out that the file cannot be parsed; the error doesn’t start with this '<' symbol, it’s ok. And now the error is ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. - Mike Yusko
    • @Maic, as jfs wrote, the problem may be something higher in the code. If you can not cope, then create a new question with a problem. Give the error code and text in the question, including the stack - gil9red