How to get the necessary information using BeautifulSoup4?

Question

Hello! Recently I began to study the python and wanted to make a small project with the conclusion of the current exchange rate. Not long after searching, I stumbled upon BeautifulSoup4 - I started reading the documentation and ... And the phrase "I look at the book I see a fig" best describes my understanding. Somewhere on this moment, I completely get lost and act at random.

import bs4 as bs import urllib.request sauce = urllib.request.urlopen('http://www.finanz.ru/valyuty/v-realnom-vremeni').read() soup = bs.BeautifulSoup(sauce,'lmxl')

Actually, the task is to pull out the exchange rate for the dollar from here , but I have a misunderstanding of what is happening, after I see this incomprehensible set tag and a bunch of classes.

I am ashamed to ask for help on this, since the answer most likely lies on the surface, but I cannot find it. Thank you in advance.

Accidentally put in the "request" extra character, I apologize.
While writing a message, accidentally inserted a character into the code.

mymedia 7,090 2 21 44 · Accepted Answer · 2017-08-08T14:40:03

How about an alternative way of parsing information?

 import pandas as pd # разбираем HTML таблицы и берём вторую, тип Pandas.DataFrame df = pd.read_html('http://www.finanz.ru/valyuty/v-realnom-vremeni', encoding='utf-8')[1] \ .dropna(axis=1) # фильтруем DF: выбираем только те строки, где вторая колонка `df[1]` # подходит под регулярку: '^USD\/' (все строки, начинающиеся с 'USD/') print(df.loc[df[1].str.contains(r'^USD\/')])

Conclusion:

 In [238]: print(df.loc[df[1].str.contains(r'^USD\/')]) 1 2 3 4 5 6 7 3 USD/RUB 599125 - 599903 -0,13% -00778 17:32:00 13 USD/EUR 08507 - 08477 0,36% 00030 17:33:00 14 USD/GBP 07711 - 07671 0,52% 00040 17:32:00 15 USD/JPY 1105840 - 1107710 -0,17% -01870 17:32:00 16 USD/CHF 09750 - 09729 0,22% 00021 17:32:00 31 USD/CHF 09750 - 09729 0,22% 00021 17:32:00 34 USD/UAH 257150 - 257650 -0,19% -00500 17:31:00

Crazy crazy 15 five · Answer 2 · 2017-08-08T14:39:28

I don’t know exactly how to work with tables, but plain text from blocks could be caught like this:

tag = soup.find("div", class_ = "text") text = str(tag)

Perhaps help. You can also remove extra tags like this:

 tag.div.decompose() # убираем вложенный div tag.p.decompose() # убираем текст в теге <p> tag.br.decompose() # убираем перенос <br>

How to get the necessary information using BeautifulSoup4?

2 answers 2

More articles: