The script does not load data from the site into the CSV file that created it. Where is the problem?

from bs4 import BeautifulSoup as Soup from urllib.request import urlopen from collections import Counter import re import csv PAGENUMBER = 4 ARAWDATA = [] BRAWDATA = [] OFFICIALDATA = {} NUMBERS = [] TIMES = [] CHANCE = [] zc = 0 while PAGENUMBER <=1: # Our way of filtering through pages COUNTER = 0 # We will need this later url = urlopen('https://www.stoloto.ru/4x20/archive/ {}'.format(PAGENUMBER)) RAW = url.read() # Reads data into variable url.close() # Closes connection PARSED = Soup(RAW, 'html.parser') # (DATA, Type of Parser) for line in PARSED.findAll('div', attrs = {"class":"numbres","class":"numbers_wrapper","class":"container.cleaered" }): if 'stoloto.ru/4x20/archive' in str(line): # Checks if tag has those char pRAW = re.findall('d=(.*?)\">', str(line)) # Gathers only the dates from that text for pline in pRAW: ARAWDATA.append(pline) # Stores data in list for mutation later for line in PARSED.findAll('div', attrs = {"class":"numbres","class":"numbers_wrapper","class":"container.cleaered" }): if '<strong>' in str(line) and 'wrap' in str(line): # Needs to be setup this long way pRAW = re.findall('<b>(.*?)</b>', str(line)) for pline in pRAW: BRAWDATA.append(pline.replace(" · ", " ")) for date in ARAWDATA: OFFICIALDATA[date] = BRAWDATA[COUNTER] # For every date it will give it value of the numbers COUNTER += 1 PAGENUMBER += 1 with open('lotto.csv', 'w') as data: file = csv.writer(data) file.writerows(OFFICIALDATA.items()) 
  • one
    There a case not through ajax the data? Save the RAW result to a file and make sure that your data is there - gil9red
  • Alas, I was engaged in programming 15 years ago. Now I want to master the python, which I have been doing for only 2 weeks. Just help, pls. - Ildar Mansurov
  • 1) Write open('rs.html', 'wb').write(RAW) in the code and look at the rs.html file make sure that you have the data you need 2) To check the question of ajax there is another way on that site open developer tool (for example, through F12), refresh the page and see which requests go - gil9red
  • And what the parser should pull out from that site? - gil9red pm
  • Thank you! The parser would have to pull the numbers. - Ildar Mansurov

1 answer 1

So your cycle hasn't even been executed. The condition will be False :

 PAGENUMBER = 4 ... while PAGENUMBER <=1: # Our way of filtering through pages ... 

But the correct loop condition will not help - the parser itself is broken.


Got an example of parsing that site:

 from bs4 import BeautifulSoup from urllib.request import urlopen import csv def parse_page(page_number): url = 'https://www.stoloto.ru/4x20/archive/{}'.format(page_number) root = BeautifulSoup(urlopen(url), 'html.parser') # Название, пример: "Результаты тиража № 1, 31 декабря 2016 в 15:10" title = root.select_one('#content > h2').text.strip() # Вытаскиваем дату, пример: "31 декабря 2016 в 15:10" date_time_str = title.split(', ')[1] # Вытаскиваем номера, пример: ['20', '2', '10', '4', '2', '16', '9', '17'] numbers = [x.text.strip() for x in root.select('.winning_numbers > ul > li')] return date_time_str, numbers max_page_number = 4 result = [] # Перебор страниц от 1 до <max_page_number> включительно for page_number in range(1, max_page_number + 1): date_time_str, numbers = parse_page(page_number) # Список чисел преобразуем в строку: # ['20', '2', '10', '4', '2', '16', '9', '17'] -> '20 2 10 4 2 16 9 17' numbers = ' '.join(numbers) result.append((page_number, date_time_str, numbers)) print(result) with open('lotto.csv', 'w', encoding='utf-8', newline='') as f: file = csv.writer(f) file.writerows(result) 

Lotto.csv file:

 1,31 декабря 2016 в 15:10,20 2 10 4 2 16 9 17 2,3 января 2017 в 22:00,12 6 20 17 3 16 9 13 3,5 января 2017 в 22:00,5 19 18 17 14 11 20 12 4,8 января 2017 в 08:20,19 17 12 5 3 8 7 6