There is a file with html code. It is necessary to find all tags with style="" and delete style="" I tried to write, it turned out something like that.

 import re pattern = 'style="(?P<text>[(\w;:-)]*)' re_style = re.compile(pattern) with open('index.html', 'r') as open_file: for line in open_file: if re_style.search(line): print(re_style.search(line).groups()) 

But style="(?P<text>[(\w;:-)]*) searches for the first non-letter. I wanted to use groups to search for text in style="" , but how then to remove the found group from the string?

  • but such a style\s*=\s*"[^"]*" and replace with nothing? doesn’t work? - splash58
  • style \ s * = \ s * "[^"] * "is appropriate. You do not need to replace. You just need to delete everything that is in style =" ". - Mikhail
  • replace with nothing, those with an empty string - here and delete. not? - splash58

2 answers 2

Maybe someone will help.

 import re import sys def replaceLine(fileName, sourseText, replaceText): file = open(fileName, 'r') text = file.read() file.close file = open(fileName, 'w') file.write(text.replace(sourseText, replaceText)) file.close pattern = 'style\s*=\s*"([^"]*)"' re_style = re.compile(pattern) with open(sys.argv[1], 'r') as open_file: for line in open_file: if re_style.search(line): replaceLine(sys.argv[1], re_style.search(line).groups()[0], '') 
  • Those. You wanted to remove all style, and not just empty? You have written that you Нужно найти все тэги с style="" и удалить style="" your code at the same time removes the style with the contents. It is possible to make the code easier, I will show an example in reply - gil9red

With the help of the regular program we find and delete the style:

 content = """ <p style=""> <p style=""></p> </p> <p style=""> <b style=""></b> </p> <p style=" "></p> <p style="12"></p> """ import re content = re.sub('style\s*=\s*"[^"]*"', '', content) print(content) 

Result:

 <p > <p ></p> </p> <p > <b ></b> </p> <p ></p> <p ></p> 

If by the condition of the question, then:

 content = re.sub('style\s*=\s*""', '', content) print(content) 

Result:

 <p > <p ></p> </p> <p > <b ></b> </p> <p style=" "></p> <p style="12"></p>