Delete text after </ data> using regular expressions [closed]

Question

Suppose there is an xml file

<data> <items> <item name="item1"></item> <item name="item2"></item> <item name="item3"></item> <item name="item4"></item> </items> </data> AAA BBB CCC

Please tell me how you can remove all that after </data> with regular expressions, I mean AAA BB ...

gil9red gil9red 31.9k four 24 69 · Accepted Answer · 2016-12-15T06:17:56

This problem can be solved without regular ones:

 text = """ <data> <items> <item name="item1"></item> <item name="item2"></item> <item name="item3"></item> <item name="item4"></item> </items> </data> AAA BBB CCC """ try: i = text.rindex('</data>') text = text[: i + len('</data>')] except ValueError: pass

But if you really want, then:

 import re text = re.sub(r'(</data>).+', r'\1', text, flags=re.DOTALL) print(text)

gil9red

31.9k four 24 69

Thank! it works, if you can tell me, it comes in a string, how can you remove the character b b'<?xml version="1.0" through re - Mike Yusko
b' ? It looks like you are typing in the console not a string, but a byte representation of the string. If you decode it, get the string: b'<?xml version="'.decode() decode can be sent to decode , utf-8 used by default. But ... if through re: re.sub(r'^b(.+)', r'\1', '''b'<?xml version="1.0"''') . In general, using re to remove the first character is very powerful, because you can simply: '''b'<?xml version="1.0"'''[1:] i.e. return the entire string after the first character - gil9red
The @ gil9red xml encoding may be different (not utf-8), therefore .decode() (without an argument using sys.getdefaultencoding () == utf-8) may cause cracks (due to using incompatible encoding) ¶ If right inside the author sees b'..' (but not in the output), then something is broken upstream (this should be corrected). decode () does not help in this case. In the worst case, ast.literal_eval () may be needed if the input cannot be changed above. - jfs
Ok, here comes this at the beginning, it is already in the console b 'I check which type, displays str. I try to parse etree.fromstring (xml), it knocks out that the file cannot be parsed; the error doesn’t start with this '<' symbol, it’s ok. And now the error is ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. - Mike Yusko
@Maic, as jfs wrote, the problem may be something higher in the code. If you can not cope, then create a new question with a problem. Give the error code and text in the question, including the stack - gil9red

|

Delete text after </ data> using regular expressions [closed]

Closed due to the fact that the essence of the question is incomprehensible by the participants of jfs , Alexey Shimansky , Bald , Denis Bubnov , user194374 21 Dec '16 at 10:58 .

1 answer 1

More articles: