Pull ALL the bigimg attribute values using BeautifulSoup

Question

Matured such a question. Just started to learn python using the BeautifulSoup library wondered. There is HTML code, for example:

 <a> <img src="/uploads/201103/thumb-img/MY-520-Nebulizer-Atomized-Inhaler-thumb-G-44318.jpg" width="42" height="42" imgb="uploads/201103/goods-img/MY-520-Nebulizer-Atomized-Inhaler-G-mid-44318.jpg" alt="MY-520 Portable Ultrasonic Nebulizer Atomized Inhaler 520" bigimg="/uploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler-G-44318.jpg" /> </a> <a> <img src="/uploads/201103/thumb-img/MY-520-Nebulizer-Atomized-Inhaler1298912536346-thumb-P-44318.jpg" alt="MY-520 Portable Ultrasonic Nebulizer Atomized Inhaler 520" imgb="/uploads/201103/goods-img/MY-520-Nebulizer-Atomized-Inhaler1298912536380-P-44318.jpg" width="42" height="42" bigimg="/uploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler1298912536529-P-44318.jpg" /> </a>

I need to pull out all the links to large images that are in the bigimg= attribute.

I wrote this line:

 itemImages = soup.find("div", "scrollableDiv").findAll("img")

But how to pull out ALL the values of the bigimg attribute bigimg can not imagine. Maybe someone faced a similar problem, I will be grateful.

rnd_d rnd_d 2.165 13 24 · Accepted Answer · 2011-10-05T21:51:23

Unfortunately, I do not know the beautiful soap library, but I know how to solve your problem with retexpami.

let's say all the HTML code you have is in the html variable

 import re big_imgs = re.findall(r'bigimg="(.*?)"', html)

In big_imgs you have an array with all the values inside bigimg.

For example, if you were given an HTML code to shove a variable into html

 >>> big_imgs = re.findall(r'bigimg="(.*?)"', html) >>> big_imgs ['/uploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler-G-44318.jpg', '/u ploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler1298912536529-P-44318. jpg']

That's exactly what was done, but I wanted to do it all with the means of beautiful soap ...

jfs jfs 44.5k eight 53 199 · Answer 2 · 2016-12-22T09:35:00

To find all elements with a bigimg attribute and "pull out" its value:

  bigimgs = [tag['bigimg'] for tag in soup.find_all(bigimg=True)]

@ SashaBlack in my profile mail, write. I do not promise to answer, but I read everything. - jfs

Kanvi kanvi 106 one five · Answer 3 · 2011-10-06T13:05:14

Try soupselect . Using it is very convenient to work with BeautifulSoup.

http://code.google.com/p/soupselect/

There are many examples of use on the Internet.

Pull ALL the bigimg attribute values using BeautifulSoup

3 answers 3

More articles:

Pull ALL the bigimg attribute values ​​using BeautifulSoup

3 answers 3

More articles:

Pull ALL the bigimg attribute values using BeautifulSoup