Hello, I am writing a news parser on scrapy, I need it to start parsing from the starting url to open every news to retrieve data, go further to the next page and do the same thing. My parsit is only the first, but does not want to go on

class GuardianSpider(CrawlSpider): name = 'guardian' allowed_domains = ['theguardian.com'] start_urls = ['https://www.theguardian.com/world/europe-news'] rules = ( Rule(LinkExtractor(restrict_xpaths=("//div[@class='u-cf index-page']",), allow=('https://www.theguardian.com/\w+/\d+/\w+/\d+/\w+',)), callback = 'parser_items'), Rule(LinkExtractor(restrict_xpaths=("//div[@class='u-cf index-page']",), allow=('https://www.theguardian.com/\w+/\w+?page=\d+',)), follow = True), ) 
  • Is the information you need not available through the open-platform.theguardian.com API? - jfs
  • I’m not just a garden, I took it as an example - Anton Goncharov
  • Then what's the problem? Can't find a working example of CrawlSpider? - jfs
  • I can not do that would go from footprint. page. It turns out that it opens the initial page, then it goes into each article, pulls out the text, then I need to go to the next page, but it does not go - Anton Goncharov

0