<div class="ramka"><div class="label_color_yellow">реклама</div> <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script> <!-- MYtricolorTV-top-ssylki-1 --> <ins class="adsbygoogle" style="display:block;height:120px;"`введите сюда код` data-ad-client="ca-pub-6002780752776386" data-ad-slot="6604387368" data-ad-format="link" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script></div> 

I wrote the script:

 import requests import urllib.request from bs4 import BeautifulSoup r = requests.get('http://mytricolortv.com/') s1 = BeautifulSoup(r.text,"html.parser") m = s1.find_all('script') print(m) 

Can't get to src. I need to extend the path //pagead2.googlesyndication.com/pagead/js/adsbygoogle.js

Help me please.

  • Thank you, Axenow! - Ivan Petrov

2 answers 2

This is how it will work. So that you do not catch exceptions then that src is not in the soup.

 from bs4 import BeautifulSoup s1 = BeautifulSoup(open("t.html", "r").read() ,"html.parser") m = s1.find_all('script') sources=s1.findAll('script',{"src":True}) for source in sources: print(source['src']) 
  • Thank you, Axenow! - Ivan Petrov
  • @ Ivan Could you then mark the answer as correct if it helped you? Thanks in advance. - Axenow

There is an option to access src like this:

 s1 = BeautifulSoup(r.text,"html.parser") print( s1.select("img")[0].attrs["src"] ) 

However, he can return None