how to write a regular expression for the string <div class="class_1 class-2 class3"></div> to get only the names of the classes ( class_1 class-2 and class3 ) and only in one case: if the classes are written inside the attribute, and not just quotes?

file text.txt:

 <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> 

  • Parsing html using regular expressions is masochism (IMHO). I can advise you to look in the direction of ready-made parsers, for example BeautifulSoap . There is already a solution to your problem. - Bogdan
  • - Bogdan yes, but I need to do just that - valeria
  • then you need a function that allocates class="class_1 class-2 class3" from the specified string, and then cuts class= and quotes. Will this feature work for you? - Bogdan
  • - Bogdan yeah !! and you can make classes appear separately, and not as a whole line? - valeria
  • one
    Try re.findall(r'class+[_-]*\d', '<div class="class_1 class-2 class3"></div>') - S. Nick

2 answers 2

 with open('text.txt', 'r') as f: for line in f: if '<div class="' in line: x = line.split('"')[1].split() if x: print(x) 
  • Sorry, there was no time to check if it works. I don’t know why, but it’s nothing. what about you? - valeria pm
  • Publish in question your text.txt file - S. Nick
  • - S. Nick did - valeria
  • - S. Nick update: I didn’t notice \d , sorry. but still, it only prints: ['class']['class']['class'] - valeria
  • Try the updated answer. - S. Nick

Here is a solution using regulars

 import re a = """ <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> """ a = a.replace("\n", "") b = re.findall(r"class\s*?=\s*?\"(.*?)\"", a) print(b)