It's all very simple. By the way, for html pages it is better not to use an xml parser - html has errors in the structure, which is why xml parser may not want to parse.
For lxml there is a simple solution that you need to import not from lxml import etree , but from lxml.html import etree :
text = """ <html><body> <iframe id="player" frameborder="0" allowfullscreen="1" title="YouTube video player" width="640" height="360" src="https://www.youtube.com/embed/vvKYhcUSrY4?autoplay=0&rel=0&showinfo=0&controls=1&modestbranding=1&enablejsapi=1&origin=http%3A%2F%2Fvkino.ua"></iframe> </body> </html> """ from lxml.html import etree root = etree.fromstring(text) # Ищем в любом месте документа атрибут 'src', который принадлежит # тегу 'iframe' с атрибутом 'id' равным 'player': match = root.xpath('//iframe[@id="player"]/@src') if match: print(match[0]) # Ищем в любом месте документа атрибут 'src', который принадлежит # любому тегу с атрибутом 'id' равным 'player': match = root.xpath('//*[@id="player"]/@src') if match: print(match[0]) # Ищем в любом месте документа тег 'iframe' с атрибутом 'id' # равным 'player': match = root.xpath('//iframe[@id="player"]') if match: print(match[0].attrib['src'])
Output to console:
https://www.youtube.com/embed/vvKYhcUSrY4?autoplay=0&rel=0&showinfo=0&controls=1&modestbranding=1&enablejsapi=1&origin=http%3A%2F%2Fvkino.ua https://www.youtube.com/embed/vvKYhcUSrY4?autoplay=0&rel=0&showinfo=0&controls=1&modestbranding=1&enablejsapi=1&origin=http%3A%2F%2Fvkino.ua https://www.youtube.com/embed/vvKYhcUSrY4?autoplay=0&rel=0&showinfo=0&controls=1&modestbranding=1&enablejsapi=1&origin=http%3A%2F%2Fvkino.ua