Wrote the code as advised here for parsing the xml file. But as I understand it, there is a problem with the encoding, or I did not understand correctly.
from bs4 import BeautifulSoup infile = open('C:\\Users\\inikitatech\\Python Example\\xml_data.xml', 'r') contents = infile.read() soup = BeautifulSoup(contents, 'xml') print(soup.select_one('id').text) print(soup.select_one('href').text.strip()) print(soup.select_one('url').text.strip()) It turns out this error:
Traceback (most recent call last): File "C:/Users/inikitatech/PycharmProjects/PythonExample/ZakupParser.py", line 4, in <module> contents = infile.read() File "C:\Users\inikitatech\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1251.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 739: character maps to <undefined> I looked at how this problem could be solved. They write that you can solve this problem using the built-in library codecs, but another error is coming out.
from bs4 import BeautifulSoup import codecs infile = codecs.open('C:\\Users\\inikitatech\\Python Example\\xml_data.xml', 'r', 'utf-8') contents = infile.read() soup = BeautifulSoup(contents, 'xml') print(soup.select_one('id').text) print(soup.select_one('href').text.strip()) print(soup.select_one('url').text.strip()) Here is a mistake:
Traceback (most recent call last): File "C:/Users/inikitatech/PycharmProjects/PythonExample/ZakupParser.py", line 6, in <module> soup = BeautifulSoup(contents, 'xml') File "C:\Users\inikitatech\PycharmProjects\PythonExample\venv\lib\site-packages\bs4\__init__.py", line 165, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library? What does this mean and how is it ok to parse xml?
soup = BeautifulSoup(contents, 'html.parser')orsoup = BeautifulSoup(contents, 'lxml')- gil9red.textreturns the text from the element, andstrip()the string method for removing empty characters on the left and right, such as spaces, tabs, transfers to the next line. those. if.textreturns the string" text\n \n", then strip it will shorten to"text"- gil9red