Parsing src = in html file

Question

<div class="ramka"><div class="label_color_yellow">реклама</div> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <!-- MYtricolorTV-top-ssylki-1 --> <ins class="adsbygoogle" style="display:block;height:120px;"`введите сюда код` data-ad-client="ca-pub-6002780752776386" data-ad-slot="6604387368" data-ad-format="link" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div>

I wrote the script:

 import requests import urllib.request from bs4 import BeautifulSoup r = requests.get('http://mytricolortv.com/') s1 = BeautifulSoup(r.text,"html.parser") m = s1.find_all('script') print(m)

Can't get to src. I need to extend the path //pagead2.googlesyndication.com/pagead/js/adsbygoogle.js

Help me please.

Axenow axenow 2,119 four 17 · Answer 1 · 2018-09-26T23:50:07

This is how it will work. So that you do not catch exceptions then that src is not in the soup.

 from bs4 import BeautifulSoup s1 = BeautifulSoup(open("t.html", "r").read() ,"html.parser") m = s1.find_all('script') sources=s1.findAll('script',{"src":True}) for source in sources: print(source['src'])

@ Ivan Could you then mark the answer as correct if it helped you?

Mikhail Rebrov 2,011 3 9 29 · Answer 2 · 2019-05-03T16:46:50

There is an option to access src like this:

 s1 = BeautifulSoup(r.text,"html.parser") print( s1.select("img")[0].attrs["src"] )

However, he can return None

Parsing src = in html file

2 answers 2

More articles: