How to pull data with a class without a span inside it?

Question

Can you please tell me how to pull the data from the following code?

<div class="day"> <a href="/prognoz/semipalatinsk/14dney/#day2" class="day__link" name="clb3279967"> <div class="day__date">Завтра</div> <div class="weather-icon weather-icon_05 margin-bottom-10" title="переменная облачность"></div> <div class="day__temperature" title="Днем">+9&deg; <span class="day__temperature__night" title="Ночью">0&deg;</span> </div> <div class="day__description"

Piton parser such

 import requests, bs4 s=requests.get('https://pogoda.mail.ru/prognoz/semipalatinsk/') b=bs4.BeautifulSoup(s.text, "html.parser") p3=b.select('.day__date') pogoda1=p3[0].getText() p4=b.select('.day .day__temperature') pogoda2=p4[0].getText() p5=b.select('.day__temperature__night') pogoda3=p5[0].getText() p6=b.select('.day__description') pogoda4=p6[0].getText() print(pogoda1 + ' ' + pogoda2 + ' ' + pogoda3 + ' ' + pogoda4)

Code output

 > Завтра +9° > 0° > 0° облачно``

And you need to be so

 > Завтра +9° 0° облачно

I understand that the problem arises because of the span inside the class day__temperature . But how to drop it is not clear.

Or you do not want to take div.day__temperature all text from div.day__temperature ?
(then for example, the strip_strings attribute can be used 2- you can select_one () use 3- requests can use the wrong encoding ( s.text ). change, see if there is an explicit API for the site (or its sources, analogues) that allows you to get the necessary information
There, after +9 there are a lot of tabs then 0. By the way, the tabs were not displayed.
I think that there are two options: either to remove the tabs and then the next day __temperature__night will not be needed.
I think it should be done through getText, and not item.find.
find() works exactly as it should work and returns exactly the result that is shown in the response.
You decide what you really want to get in the end and update the question.

Alexander Alexander 874 one 6 12 · Accepted Answer · 2018-04-11T15:37:24

 import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.2)\ AppleWebKit/537.36 (KHTML, like Gecko)\ Chrome/63.0.3239.84 Safari/537.36', 'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7' } url = 'https://pogoda.mail.ru/prognoz/semipalatinsk/' page = requests.get(url, headers=headers).content item = BeautifulSoup(page, 'lxml') presently = item.find('a', {'href': '/prognoz/semipalatinsk/14dney/#day2'}).text tomorrow = item.find('a', {'href': '/prognoz/semipalatinsk/14dney/#day3'}).text print(presently.replace('\n', ' ').replace('\t', '')) print(tomorrow.replace('\n', ' ').replace('\t', ''))

Alexander Alexander 874 one 6 12 · Answer 2 · 2018-04-10T19:58:44

So you can solve the problem

 import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.2)\ AppleWebKit/537.36 (KHTML, like Gecko)\ Chrome/63.0.3239.84 Safari/537.36', 'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7' } url = 'https://pogoda.mail.ru/prognoz/semipalatinsk/' page = requests.get(url, headers=headers).content item = BeautifulSoup(page, 'lxml') date = item.find('div', {'class': 'day__date'}).text day_temperature = item.find('div', {'class': 'day__temperature'}).text day_temperature_night = item.find('span', {'class': 'day__temperature__night'}).text day__description = item.find('div', {'class': 'day__description'}).text print(date, day_temperature[0:4], day_temperature_night, day__description)

And I need, besides the weather for tomorrow, the day after tomorrow, etc.
at the time when asked the question was focused solely on the first line ("Tomorrow").

jfs jfs 44.5k eight 53 199 · Answer 3 · 2018-04-11T05:08:40

To print the first non-empty string from the div.day__temperature element:

 print(next(soup.find('div', 'day__temperature').stripped_strings))

or

 print(next(soup.select_one('div.day__temperature').stripped_strings))

The result in both cases: +9° .

How to pull data with a class without a span inside it?

3 answers 3

More articles: