python. write a regular expression for the string “<div class =” class_1 class-2 class3 “> </ div>”

Question

how to write a regular expression for the string <div class="class_1 class-2 class3"></div> to get only the names of the classes ( class_1 class-2 and class3 ) and only in one case: if the classes are written inside the attribute, and not just quotes?

file text.txt:

 <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div>

I can advise you to look in the direction of ready-made parsers, for example BeautifulSoap .
then you need a function that allocates class="class_1 class-2 class3" from the specified string, and then cuts class= and quotes.
and you can make classes appear separately, and not as a whole line?
Try re.findall(r'class+[_-]*\d', '<div class="class_1 class-2 class3"></div>')

Accepted Answer · 2019-01-19T10:13:41

 with open('text.txt', 'r') as f: for line in f: if '<div class="' in line: x = line.split('"')[1].split() if x: print(x)

S. Nick

5.672 2 five 12

Sorry, there was no time to check if it works. I don’t know why, but it’s nothing. what about you? - valeria pm
Publish in question your text.txt file - S. Nick
- S. Nick did - valeria
- S. Nick update: I didn’t notice \d , sorry. but still, it only prints: ['class']['class']['class'] - valeria
Try the updated answer. - S. Nick

|

Xxx xxx 608 one eleven · Answer 2 · 2019-01-19T18:28:51

Here is a solution using regulars

 import re a = """ <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> <div class="qwerty hel_lo tuy-iy">content</div> """ a = a.replace("\n", "") b = re.findall(r"class\s*?=\s*?\"(.*?)\"", a) print(b)

python. write a regular expression for the string “<div class =” class_1 class-2 class3 “> </ div>”

2 answers 2

More articles: