Matured such a question. Just started to learn python using the BeautifulSoup library wondered. There is HTML code, for example:

 <a> <img src="/uploads/201103/thumb-img/MY-520-Nebulizer-Atomized-Inhaler-thumb-G-44318.jpg" width="42" height="42" imgb="uploads/201103/goods-img/MY-520-Nebulizer-Atomized-Inhaler-G-mid-44318.jpg" alt="MY-520 Portable Ultrasonic Nebulizer Atomized Inhaler 520" bigimg="/uploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler-G-44318.jpg" /> </a> <a> <img src="/uploads/201103/thumb-img/MY-520-Nebulizer-Atomized-Inhaler1298912536346-thumb-P-44318.jpg" alt="MY-520 Portable Ultrasonic Nebulizer Atomized Inhaler 520" imgb="/uploads/201103/goods-img/MY-520-Nebulizer-Atomized-Inhaler1298912536380-P-44318.jpg" width="42" height="42" bigimg="/uploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler1298912536529-P-44318.jpg" /> </a> 

I need to pull out all the links to large images that are in the bigimg= attribute.

I wrote this line:

 itemImages = soup.find("div", "scrollableDiv").findAll("img") 

But how to pull out ALL the values ​​of the bigimg attribute bigimg can not imagine. Maybe someone faced a similar problem, I will be grateful.

    3 answers 3

    Unfortunately, I do not know the beautiful soap library, but I know how to solve your problem with retexpami.

    let's say all the HTML code you have is in the html variable

     import re big_imgs = re.findall(r'bigimg="(.*?)"', html) 

    In big_imgs you have an array with all the values ​​inside bigimg.

    For example, if you were given an HTML code to shove a variable into html

     >>> big_imgs = re.findall(r'bigimg="(.*?)"', html) >>> big_imgs ['/uploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler-G-44318.jpg', '/u ploads/201103/source-img/MY-520-Nebulizer-Atomized-Inhaler1298912536529-P-44318. jpg'] 
    • That's exactly what was done, but I wanted to do it all with the means of beautiful soap ... - xenoll

    To find all elements with a bigimg attribute and "pull out" its value:

      bigimgs = [tag['bigimg'] for tag in soup.find_all(bigimg=True)] 
    • one
      @ SashaBlack in my profile mail, write. I do not promise to answer, but I read everything. - jfs

    Try soupselect . Using it is very convenient to work with BeautifulSoup.

    http://code.google.com/p/soupselect/

    There are many examples of use on the Internet.