Line comments on the code:
Search for an item by item name and its class
Instead:
soup.find_all("tr", {"class" : "belowHeader"})
You can simply:
soup.find_all("tr", "belowHeader")
Use enumerate () to get the loop index
Instead:
i = 0 for td in tr.find_all('td', 'tdteamname2'): ... i += 1
Should write:
for i, td in enumerate(row.find_all('td', 'tdteamname2')): ...
You can use the names of the elements tr, td instead of row, row1, row2
Use explicit collections instead of numbered names.
Instead:
x = 0 for row3 in row.find_all("td", {"class" : "tdpercentmw1"}): if x == 0: coef1 = row3.get_text() elif x == 1: coef2 = row3.get_text() else: coef3 = row3.get_text() x += 1
Use:
coef = [td.get_text() for td in tr.find_all('td', 'tdpercentmw1')]
Similarly for a team :
team = [td.get_text() for td in tr.find_all('td', 'tdteamname2')]
[optional] You can use print(*коллекция)
Instead:
print(team1+" "+team2+" "+coef1+" "+coef2+" "+coef3)
You can write:
print(*team, *coef)
Transmit encoding if known from http headers
Instead:
soup = BeautifulSoup(page.content, "html5lib")
You can write:
soup = BeautifulSoup(page.content, "html5lib", from_encoding=page.encoding)
[optional] use stdlib if there is no reason to reverse
For example, if there are no special problems with the markup, then you can use the 'html.parser' instead of the 'html5lib' parser.
Or if you have enough possibilities in your case, urlopen() , then you can not use requests — this may be less secure ( requests are updated more often), but bugs are more stable.
In your case, you can let the script just die, because if the page is not loaded, then it has nothing to do. You can catch the expected types of exceptions and end with an informative error message (from the experience of using it is clear what types to expect, for example, you can start with OSError ). Do not catch too much to not hide the bugs in the code.
In order not to litter the console, you can sys.excepthook your sys.excepthook .
If you collect all the code in one place:
#!/usr/bin/env python3 from urllib.request import urlopen from bs4 import BeautifoulSoup # $ pip install beautifulsoup4 soup = BeautifulSoup(urlopen('http://example.com'), 'html.parser') for tr in soup.find_all('tr', 'belowHeader'): team = (td.get_text() for td in tr.find_all('td', 'tdteamname2')) coef = (td.get_text() for td in tr.find_all('td', 'tdpercentmw1')) print(*team, *coef)
If problems with the encoding arise, then the response.headers.get_content_charset(default) be passed as the from_encoding parameter.
If there are problems with html recognition speed (not loading), then you can try the 'lxml' parser, instead of 'html.parser' .
Nested loops look justified here. If there is no special reason to get rid of them, you can leave them.