I teach Python, so as not to learn dry code, I decided to learn in practice. Began to write a parser. I can not only parse one item with a phone number. Only it turns out to parse all the span tags and all that is in them.
<span class="ls-detail_price">8 000 $</span> <span>373-76-766250, 373-77-592228</span> <span class="ls-detail_price">6 000 $</span> <span>373-76-966250, 373-77-592233</span> <div class="ls-detail_anData"> <span class="ls-detail_price">1 600 $</span> <div class="mapath list"> <span id="pointer_icon">Тирасполь</span> </div> <div> <span class="phone_icon">373-77-534801</span> </div> </div> Tried through a class that refers to a phone number, but does not work and returns an empty array. Phone inserted through before :: how to get to it using Python? The goal is to get rid of the span tags and the extra line with the price. Please do not minus, I really want to understand, just can not understand what to do next.
# -*- coding: utf-8 -*- #!/usr/bin/env python3 import urllib.request from bs4 import BeautifulSoup from lxml import html fname = 'test' def get_html(url): response = urllib.request.urlopen(url) return response.read() def parse(html): projects = [] #Парсим всю страницу целиком soup = BeautifulSoup(html) #Делаем обор по тегу и классу div = soup.find('div', class_='ls-detail') #Находим только что связано с тегом span for row in div.find_all('span'): print(row) def main(): parse(get_html('https://makler.md/ru/transport/cars')) if __name__ == '__main__': main() Well, I did it this way, but he mostly spars only text and small tagged constructions.
#!/usr/bin/env python3 from bs4 import BeautifulSoup # $ pip install beautifulsoup4 from urllib.request import urlopen url = 'https://makler.md/ru/transport/cars' fname = 'test' def get_html(url): with urlopen(url) as html_page: charset = html_page.headers.get_content_charset(None) # may be None soup = BeautifulSoup(html_page, 'html.parser', from_encoding=charset) f = open("%s %s" % (fname,".html"), "w") f.write(soup.text) f.close def main(): get_html('https://makler.md/ru/transport/cars') if __name__ == '__main__': main()
ls-detail_priceand call tagsls-detail_pricefor tags to get the next item after the current tag (the next one doesn’t mean it's nested) and it will span with phones. Ifnext_siblingdid not help, there is its counterpart, I don’t remember exactly, something like: find_next_sibling in it indicates the tag that comes after the current one. And I would advise to use css-selectors, instead of methods. It seems to me that they are easier perceived, and are used in many places. Example:div = soup.select('div.ls-detail')or simplydiv = soup.select('.ls-detail'). - gil9redprint soup.find(text="Address:").findNext('td').contents[0]- Stasinskii