The essence of the question is this: there is a certain site, I log in to it using phantomjs , I get a json file with cookies. After that, I would like to give these cookies to scrapy so that scrapy will log in and retrieve information from the site.

When testing the code below at httpbin.org/cookies , there are no cookies received. If I set an index, for example cookies [0] , then at httpbin.org/cookies I get one cookie (which is predictable). But you need to give everything from the json file.

class MySpider(BaseSpider): name = 'MySpider' start_urls = ['http://site.ru'] def get_cookies(self): os.system("phantomjs ~/ph.js") with open('cookie.json') as data_file: data = json.load(data_file) print data return data def parse(self, response): cookies = self.get_cookies() return Request(url="http://httpbin.org/cookies", cookies=cookies, callback=self.after_login) def after_login(self, response): print response.body_as_unicode().encode('utf-8') 
  • What does print data print and in what format is Request waiting for cookies ? - jfs

1 answer 1

 >>> r=Request(url="http://httpbin.org/cookies",cookies={'x':1,'y':2,'z':3}) >>> fetch(r) 2016-06-10 03:36:15 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/cookies> (referer: None) >>> print response.body { "cookies": { "x": "1", "y": "2", "z": "3" } } 

See what Phantomjs is doing there. in the same place for certain not the dictionary and the list with data dictionaries

If I set an index, for example cookies [0], then at httpbin.org/cookies I get one cookie (which is predictable).

Request waits for the dictionary, and you slip a list of dictionaries to it. but when I did correctly, I thought it was a mistake: D

that's how it should be

 ...cookies=dict((cookie['name'],cookie['value']) for cookie in self.get_cookies())