Parsing BeautifulSoup library: how to get <a> elements with a given attribute

Question

There are links like:

<a chapter="1" href="/url1/">1</a> <a chapter="2" href="/url2/">2</a> <a chapter="3" href="/url3/">3</a> <a name="n1" href="/url1/">1</a> <a name="n2" href="/url2/">2</a>

How can I get href'y only links with the attribute "chapter"?

Or in the forehead: get a list of tags a, filter those that have the attribute chapter

waynee waynee 311 four 9 · Accepted Answer · 2016-11-23T11:35:36

 from bs4 import BeautifulSoup r = ''' <a chapter="1" href="/url1/">1</a> <a chapter="2" href="/url2/">2</a> <a chapter="3" href="/url3/">3</a> <a name="n1" href="/url1/">1</a> <a name="n2" href="/url2/">2</a>''' soup = BeautifulSoup(r, 'html.parser') for a in soup.find_all('a', chapter=True): print(a)

alecxe alecxe 500 four 23 · Answer 2 · 2016-12-22T02:43:16

An alternative and slightly more concise way is to use CSS selectors — at this point, BeautifulSoup supports a limited set of selectors — but for most everyday tasks there is enough:

 for a in soup.select('a[chapter]'): print(a) # или print(a.get_text()) чтобы распечатать тексты ссылок

Parsing BeautifulSoup library: how to get <a> elements with a given attribute

2 answers 2

More articles: