For educational purposes, I am trying to parse the auto-ads site https://ab.onliner.by , the purpose of the parsing is to get links to cars, the link looks like " https://ab.onliner.by/car/ID ". When analyzing the site through the browser, it is clearly seen that the desired machine ID resides inside the tag:

<a href="/car/4164123"><img width="80" height="80" src="https://content.onliner.by/automarket/2218487/80x80/496c37de7ec4ec3eabf6eb66e6c9bb24.jpeg"></a> 

The problem is that this tag is not in the returned HTML code.

 import requests from bs4 import BeautifulSoup def get_html(url): response = requests.get(url) return response.text print(get_html('https://ab.onliner.by')) 

Actually there are questions that I am missing / doing wrong?

  • Look at the generated DOM in the browser instead of looking at the actual source code of the page. Press RMB → “View page code” (or “Page source code”) in the same browser, and you will see that there are no links there, and they are generated on the fly in Java script - andreymal
  • How to get this page generated from Python? - Alexander
  • @ Alexander Use Selenium package with Chrome or Firefox. There are a number of headless browsers to do this completely in the background, but I haven’t gotten to them yet. - Sergey Nudnov

1 answer 1

To download a page with scripts, use the Selenium package with Chrome or Firefox.

 from selenium import webdriver import time chrome_driver = 'C:/Tools/ChromeDriver/chromedriver.exe' chrome_options = webdriver.ChromeOptions() driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options) driver.get('https://ab.onliner.by') # Таймаут, чтобы JS успели отработать. # Использование time.sleep - это грубый и не очень надёжный подход # Лучше почитать и использовать Expected Conditions из того же Selenium # from selenium.webdriver.support.ui import WebDriverWait # from selenium.webdriver.support import expected_conditions as EC time.sleep(5) print(driver.page_source) 
  • Yes, it works, thanks! - Alexander
  • In the background, chrome_options.add_argument ('headless') - Alexander
  • And this is chrome_options.add_argument ('window-size = 1920x935') - Alexander
  • I'm wondering, will it work without an installed browser? - Alexander
  • @ Alexander, without the installed browser - will not be - Sergey Nudnov