Need to remove the issue tag <b> used by BeautifulSoup

Question

The code below parses the .csv file of urls from Yandex search results, but the parsit with the tag "b" "/ b" turns out like that - doc-me.ru How to fix the code for this? The quotes are placed specifically so that stackoverflow does not take tags as bold text, i.e. need to pull out what is in the tag "bold text" -

The edits will be on this line: items = soup.find_all ('a', {'class': 'link link_outer_yes link_theme_outer path__item i-bem'})

import requests from bs4 import BeautifulSoup import csv def get_html(url): response = requests.get(url) return response.text def get_data_items(html): soup = BeautifulSoup(html, 'lxml') items = soup.find_all('a', {'class' : 'link link_outer_yes link_theme_outer path__item i-bem'}) # return [a.get('href') for a in items] for a in items: href_soup = a.get('href') data = {'url': a, 'href': href_soup} write_data_csv(data) def write_data_csv(data): with open('data.csv', 'a') as file: writer = csv.writer(file) writer.writerow((data['url'])) def main(): url = 'https://yandex.ru/search/?clid=9582&text=скачать&lr=118890&p=1' print('Парсим следующий url:') print(url) # print(get_data_items(get_html(url))) if __name__ == '__main__': main()

Dimabytes Dimabytes 335 one eleven · Answer 1 · 2018-08-18T20:22:45

During the passage of the objects obtained here:

 for a in items:

can add :

 text = abtext

then the text from the b tag will be written to the text variable at each new pass

Need to remove the issue tag <b> used by BeautifulSoup

1 answer 1

More articles: