How to parse links with nested tags using BS4? (Question 2)

Question

<article> <div class="article-content"> <h4 class="link-title" style="text-transform:capitalize"> <!--<a rel="nofollow" href="/view-site/viewframe.asp?url=http://www.ama-assn.org"></a> --></h4> <!--<a rel="nofollow" href="/view-site/viewframe.asp?url=http://www.ama-assn.org">http://www.ama-assn.org</a>--> Website : <span class="">http://www.ama-assn.org</span> <br>American Medical Association</span><br><br> <a href="ratingsite.asp?id=1&sta=indian" class="flat-blue">Rate</a> <a href="../regis/chk4login.asp?from=../medicalwebsite/rating_comments.asp?id=1" class="flat-blue">Comments</a> <a href="broken_links.asp?id=1&sta=indian" class="flat-blue"> Submit broken link</a> <!--<a href="edit.asp?urlid=1&sta=indian&pages=Medical&id=1" class="flat-blue">Edit</a>--> </div> </article>

I need to parse the site name http://www.ama-assn.org and its description of the American Medical Association

I get to and then plug. I can not pull out the data. Code:

 import urllib.request from bs4 import BeautifulSoup def get_html(url): response = urllib.request.urlopen(url) return response.read() def parse(html): soup = BeautifulSoup(html, 'html.parser') site = soup.find('div', class_='article-content') span = site.find('span', class_="") print(span) def main(): parse(get_html('https://www.website.net')) if __name__ == '__main__': main()

It turned out using the command span = site.find('span', class_='').get_text() to pull out the name of the site, but then I can not pull out the American Medical Association.

MaxU MaxU 52.3k 6 18 51 · Answer 1 · 2018-09-18T15:51:07

 import requests from bs4 import BeautifulSoup url = 'https://www.ama-assn.org/' r = requests.get(url) soup = BeautifulSoup(r.text,"lxml") title = soup.find('title').text print(title)

result:

 American Medical Association | AMA

or so:

 >>> print(title.partition(' | ')[0]) American Medical Association

MaxU, thank you, are you offering me to climb on the site for each link.
@IvanPetrov, I suggest you first specify the "valid" HTML in the question.
The closing tag </span> does not have a pair opening tag ...

How to parse links with nested tags using BS4? (Question 2)

1 answer 1

More articles: