How to find and delete a group in a regular expression?

Question

There is a file with html code. It is necessary to find all tags with style="" and delete style="" I tried to write, it turned out something like that.

 import re pattern = 'style="(?P<text>[(\w;:-)]*)' re_style = re.compile(pattern) with open('index.html', 'r') as open_file: for line in open_file: if re_style.search(line): print(re_style.search(line).groups())

But style="(?P<text>[(\w;:-)]*) searches for the first non-letter. I wanted to use groups to search for text in style="" , but how then to remove the found group from the string?

but such a style\s*=\s*"[^"]*" and replace with nothing? doesn’t work?
style \ s * = \ s * "[^"] * "is appropriate. You do not need to replace. You just need to delete everything that is in style =" ".
replace with nothing, those with an empty string - here and delete.

Mikhail Mikhail 78 eight · Answer 1 · 2016-04-26T15:12:21

Maybe someone will help.

 import re import sys def replaceLine(fileName, sourseText, replaceText): file = open(fileName, 'r') text = file.read() file.close file = open(fileName, 'w') file.write(text.replace(sourseText, replaceText)) file.close pattern = 'style\s*=\s*"([^"]*)"' re_style = re.compile(pattern) with open(sys.argv[1], 'r') as open_file: for line in open_file: if re_style.search(line): replaceLine(sys.argv[1], re_style.search(line).groups()[0], '')

You have written that you Нужно найти все тэги с style="" и удалить style="" your code at the same time removes the style with the contents.
It is possible to make the code easier, I will show an example in reply

gil9red gil9red 31.9k four 24 69 · Answer 2 · 2016-06-03T06:59:21

With the help of the regular program we find and delete the style:

 content = """ <p style=""> <p style=""></p> </p> <p style=""> <b style=""></b> </p> <p style=" "></p> <p style="12"></p> """ import re content = re.sub('style\s*=\s*"[^"]*"', '', content) print(content)

Result:

 <p > <p ></p> </p> <p > <b ></b> </p> <p ></p> <p ></p>

If by the condition of the question, then:

 content = re.sub('style\s*=\s*""', '', content) print(content)

Result:

 <p > <p ></p> </p> <p > <b ></b> </p> <p style=" "></p> <p style="12"></p>

How to find and delete a group in a regular expression?

2 answers 2

More articles: