The script does not load data from the site into a CSV file

Question

The script does not load data from the site into the CSV file that created it. Where is the problem?

from bs4 import BeautifulSoup as Soup from urllib.request import urlopen from collections import Counter import re import csv PAGENUMBER = 4 ARAWDATA = [] BRAWDATA = [] OFFICIALDATA = {} NUMBERS = [] TIMES = [] CHANCE = [] zc = 0 while PAGENUMBER <=1: # Our way of filtering through pages COUNTER = 0 # We will need this later url = urlopen('https://www.stoloto.ru/4x20/archive/ {}'.format(PAGENUMBER)) RAW = url.read() # Reads data into variable url.close() # Closes connection PARSED = Soup(RAW, 'html.parser') # (DATA, Type of Parser) for line in PARSED.findAll('div', attrs = {"class":"numbres","class":"numbers_wrapper","class":"container.cleaered" }): if 'stoloto.ru/4x20/archive' in str(line): # Checks if tag has those char pRAW = re.findall('d=(.*?)\">', str(line)) # Gathers only the dates from that text for pline in pRAW: ARAWDATA.append(pline) # Stores data in list for mutation later for line in PARSED.findAll('div', attrs = {"class":"numbres","class":"numbers_wrapper","class":"container.cleaered" }): if '<strong>' in str(line) and 'wrap' in str(line): # Needs to be setup this long way pRAW = re.findall('<b>(.*?)</b>', str(line)) for pline in pRAW: BRAWDATA.append(pline.replace(" · ", " ")) for date in ARAWDATA: OFFICIALDATA[date] = BRAWDATA[COUNTER] # For every date it will give it value of the numbers COUNTER += 1 PAGENUMBER += 1 with open('lotto.csv', 'w') as data: file = csv.writer(data) file.writerows(OFFICIALDATA.items())

Alas, I was engaged in programming 15 years ago. Now I want to master the python, which I have been doing for only 2 weeks. Just help, pls.
1) Write open('rs.html', 'wb').write(RAW) in the code and look at the rs.html file make sure that you have the data you need 2) To check the question of ajax there is another way on that site open developer tool (for example, through F12), refresh the page and see which requests go

Accepted Answer · 2018-12-04T14:53:57

So your cycle hasn't even been executed. The condition will be False :

 PAGENUMBER = 4 ... while PAGENUMBER <=1: # Our way of filtering through pages ...

But the correct loop condition will not help - the parser itself is broken.

Got an example of parsing that site:

 from bs4 import BeautifulSoup from urllib.request import urlopen import csv def parse_page(page_number): url = 'https://www.stoloto.ru/4x20/archive/{}'.format(page_number) root = BeautifulSoup(urlopen(url), 'html.parser') # Название, пример: "Результаты тиража № 1, 31 декабря 2016 в 15:10" title = root.select_one('#content > h2').text.strip() # Вытаскиваем дату, пример: "31 декабря 2016 в 15:10" date_time_str = title.split(', ')[1] # Вытаскиваем номера, пример: ['20', '2', '10', '4', '2', '16', '9', '17'] numbers = [x.text.strip() for x in root.select('.winning_numbers > ul > li')] return date_time_str, numbers max_page_number = 4 result = [] # Перебор страниц от 1 до <max_page_number> включительно for page_number in range(1, max_page_number + 1): date_time_str, numbers = parse_page(page_number) # Список чисел преобразуем в строку: # ['20', '2', '10', '4', '2', '16', '9', '17'] -> '20 2 10 4 2 16 9 17' numbers = ' '.join(numbers) result.append((page_number, date_time_str, numbers)) print(result) with open('lotto.csv', 'w', encoding='utf-8', newline='') as f: file = csv.writer(f) file.writerows(result)

Lotto.csv file:

 1,31 декабря 2016 в 15:10,20 2 10 4 2 16 9 17 2,3 января 2017 в 22:00,12 6 20 17 3 16 9 13 3,5 января 2017 в 22:00,5 19 18 17 14 11 20 12 4,8 января 2017 в 08:20,19 17 12 5 3 8 7 6

Thank you so much! I will continue to teach Python. In 58 it is not easy. BUT ... there was an experience.
@IldarMansurov, good luck with your training :) if you like the answer, then vote for the answer or mark it as a solution .
Kst, after that I parked that site a little more, for pulling the result of all games from the main page, let it be a bonus: github.com/gil9red/SimplePyScripts/blob/…

The script does not load data from the site into a CSV file

1 answer 1

More articles: