Can you please tell me how to pull the data from the following code?

<div class="day"> <a href="/prognoz/semipalatinsk/14dney/#day2" class="day__link" name="clb3279967"> <div class="day__date">Завтра</div> <div class="weather-icon weather-icon_05 margin-bottom-10" title="переменная облачность"></div> <div class="day__temperature" title="Днем">+9&deg; <span class="day__temperature__night" title="Ночью">0&deg;</span> </div> <div class="day__description" 

Piton parser such

 import requests, bs4 s=requests.get('https://pogoda.mail.ru/prognoz/semipalatinsk/') b=bs4.BeautifulSoup(s.text, "html.parser") p3=b.select('.day__date') pogoda1=p3[0].getText() p4=b.select('.day .day__temperature') pogoda2=p4[0].getText() p5=b.select('.day__temperature__night') pogoda3=p5[0].getText() p6=b.select('.day__description') pogoda4=p6[0].getText() print(pogoda1 + ' ' + pogoda2 + ' ' + pogoda3 + ' ' + pogoda4) 

Code output

 > Завтра +9° > 0° > 0° облачно`` 

And you need to be so

 > Завтра +9° 0° облачно 

I understand that the problem arises because of the span inside the class day__temperature . But how to drop it is not clear.

  • 1- you just want to remove an extra space? Try strip = true in get_text () to pass. Or you do not want to take div.day__temperature all text from div.day__temperature ? (then for example, the strip_strings attribute can be used 2- you can select_one () use 3- requests can use the wrong encoding ( s.text ). change, see if there is an explicit API for the site (or its sources, analogues) that allows you to get the necessary information - jfs
  • There, after +9 there are a lot of tabs then 0. By the way, the tabs were not displayed. I think that there are two options: either to remove the tabs and then the next day __temperature__night will not be needed. Either cut day__temperature to 4 characters. And "0" pull out of day__temperature__night. I think it should be done through getText, and not item.find. in the latter case, it finds only the first match. - Audi
  • find() works exactly as it should work and returns exactly the result that is shown in the response. You decide what you really want to get in the end and update the question. - jfs

3 answers 3

 import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.2)\ AppleWebKit/537.36 (KHTML, like Gecko)\ Chrome/63.0.3239.84 Safari/537.36', 'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7' } url = 'https://pogoda.mail.ru/prognoz/semipalatinsk/' page = requests.get(url, headers=headers).content item = BeautifulSoup(page, 'lxml') presently = item.find('a', {'href': '/prognoz/semipalatinsk/14dney/#day2'}).text tomorrow = item.find('a', {'href': '/prognoz/semipalatinsk/14dney/#day3'}).text print(presently.replace('\n', ' ').replace('\t', '')) print(tomorrow.replace('\n', ' ').replace('\t', '')) 

    So you can solve the problem

     import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.2)\ AppleWebKit/537.36 (KHTML, like Gecko)\ Chrome/63.0.3239.84 Safari/537.36', 'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7' } url = 'https://pogoda.mail.ru/prognoz/semipalatinsk/' page = requests.get(url, headers=headers).content item = BeautifulSoup(page, 'lxml') date = item.find('div', {'class': 'day__date'}).text day_temperature = item.find('div', {'class': 'day__temperature'}).text day_temperature_night = item.find('span', {'class': 'day__temperature__night'}).text day__description = item.find('div', {'class': 'day__description'}).text print(date, day_temperature[0:4], day_temperature_night, day__description) 
    • Thank. Works. But find, as I understand it, finds the first match. And I need, besides the weather for tomorrow, the day after tomorrow, etc. parsit Sorry, I forgot to mention this because at the time when asked the question was focused solely on the first line ("Tomorrow"). - Audi

    To print the first non-empty string from the div.day__temperature element:

     print(next(soup.find('div', 'day__temperature').stripped_strings)) 

    or

     print(next(soup.select_one('div.day__temperature').stripped_strings)) 

    The result in both cases: +9° .