Asked a question yesterday about the site dns-shop.ru. One good person suggested to use Selenium. Now there is another problem. I pulled out all that I wanted - the names of the goods, the cost and links to them. I can not understand how I cram the whole thing in the dictionary. That the name was a key, and cost and the link value. I created a dictionary d = {}, and when I run through a for loop, they are written to the list. But the problem is that the record goes first all the names of the goods, then the entire cost, then the links. And I need to have a name, value, link, etc. Maybe I need to use nested loops, but it turns out some kind of garbage, displays first the name, then all prices, links, then again one name, all prices from the page and links, etc.
from selenium import webdriver from lxml import html page_num = 1 url = 'https://www.dns-shop.ru/catalog/17a8a01d16404e77/smartfony/?p=%s&i=1&mode=list&brand=brand-apple' % page_num driver = webdriver.Firefox() driver.get(url) content = driver.page_source tree = html.fromstring(content) last_page = tree.xpath('//span[@class=" item edge"]')[0].attrib.get('data-page-number') last_page = int(last_page) d={} while page_num <= last_page: url = 'https://www.dns-shop.ru/catalog/17a8a01d16404e77/smartfony/?p=%s&i=1&mode=list&brand=brand-apple' % page_num driver.get(url) name = driver.find_elements_by_tag_name('h3') price = driver.find_elements_by_class_name('price_g') link = driver.find_elements_by_xpath("//div[@class='title']/a") print('Страница: ', page_num) for i in name: i = i.text print(i) d.append(i) for i in price: i = i.text print(i) d.append(i) for i in link: i = i.get_attribute("href") print(i) d.append(i) page_num += 1 print (d) driver.close() I did now like this:
for i in name: i = i.text for j in price: j = j.text for k in link: k = k.get_attribute("href") d[i] = [j ,k] It seems to be as I wanted, but the price and the link do not correspond to the product.
Redid d from dictionary to list. It worked only for the first "trinity", then some links and the price went in a chaotic manner. I do not understand why
for i in name: i = i.text d.append(i) for j in price: j = j.text d.append(j) for k in link: k = k.get_attribute("href") d.append(k)