Selenium python crawls pages when traversing a cycle

Question

I'm just starting to program, so I apologize for the clumsiness of the code. I need to find the files on the site and upload them to a folder. A request for the desired file is generated from the loop traversal, but when you run the code all the time, a couple of files are simply ignored ... If you watch the parsing process in an open browser, all files are downloaded if left in the background, not all What could be the problem?

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from bs4 import BeautifulSoup import requests import shutil import os driver = webdriver.Firefox() driver.get("http://www.atsenergo.ru/") driver.find_elements_by_partial_link_text('Участникам розничного рынка')[0].click() driver.find_elements_by_partial_link_text('Ставки тарифа на услуги по передаче электроэнергии, используемые для целей определения расходов на оплату нормативных потерь')[0].click() driver.find_elements_by_xpath("//*[contains(text(), 'Европа')]")[0].click() months = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12'] years = ['2019', '2018'] for year in years: for month in months: filename = year + month + '01' + '_FRSTF_ATS_REPORT_PUBLIC_FSK.xls' driver.implicitly_wait(3) try: driver.find_elements_by_xpath("//*[contains(text(), '{}')]".format('01.'+ month + '.' + year))[0].click() pSource= driver.page_source soup = BeautifulSoup(pSource, "html.parser") l = soup.find_all('a') for link in l: if filename in link: link1 = 'https://www.atsenergo.ru/nreport' + link.get('href') # определяем ссылку на файл для скачивания r = requests.get(link1) # запрашиваем файл для скачивания output = open(filename, 'wb') output.write(r.content) output.close() source_files = os.getcwd() shutil.move(source_files + '\\' + filename, 'D:\\Сети\\2019\\ФСК' + '\\' + filename) print('файл', filename, 'скачен и сохранен в папке', 'D:\\Сети\\2019\\ФСК') driver.find_elements_by_xpath("//*[contains(text(), 'Календарь')]")[0].click() except IndexError: print('за', month, 'месяц', year, 'год данные не найдены') continue

As a result, they are definitely present on the site, but the files for 20190301, 20180401, 20180901, 20181101
And what if the files are downloaded not by requester, but by the browser itself.
You can disable confirmation of downloads in the browser settings, and the files will be downloaded when you click on the link immediately and to the default folder (the folder can also be configured in the options of the browser itself).
So you will get rid of both the request and the Soup, and only the Selenium Browser will work)))) If this option suits you, I will write the answer)
In the settings there is a download to the default folder, but when you click through selenium, the folder for saving
"Is it possible to disable confirmation of downloads in the browser settings"?
I talked about this)))) Confirmation of the download, this is a custom behavior) the dialog box can be disabled either for a particular file type, or just for all downloads)

Sergey Nudnov Sergey Nudnov 3,450 one 6 17 · Answer 1 · 2019-04-22T16:16:47

Most likely, Exception is not due to the fact that the script does not find the element on the page, but because it tries to click the found element too early when the element is not ready yet.

I advise you to abandon implicitly_wait and use an explicit waits approach ( documentation ):

 from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC WebDriverWait(driver, 10).until(EC.element_to_be_clickable( ( By.XPATH, "//*[contains(text(), '{}')]".format('01.'+ month + '.' + year) ) ).click()

@Gaillardinija So far I can not help more - on departure until Thursday, without a computer.
Try replacing the condition with ec.presence_of_element_located to see if these elements are on the page

Selenium python crawls pages when traversing a cycle

1 answer 1

More articles: