Mistake:

... for res in _socket.getaddrinfo (host, port, family, type, proto, flags):

socket.gaierror: [Errno 11001] getaddrinfo failed

During loading

Python Code:

# -*- coding: utf-8 -*- import urllib.request from lxml.html import parse WEBSITE = 'http://allrecipes.com' URL_PAGE = 'http://allrecipes.com/recipes/110/appetizers-and-snacks/deviled-eggs/?page=' START_PAGE = 1 END_PAGE = 5 def correct_str(s): return s.encode('utf-8').decode('ascii', 'ignore').strip() for i in range(START_PAGE, END_PAGE+1): URL = URL_PAGE + str(i) HTML = urllib.request.urlopen(URL) page = parse(HTML).getroot() # пропускаем видео for elem in page.xpath('//*[@id="grid"]/article[not(contains(@class, "video-card"))]/a[1]'): href = WEBSITE + elem.get('href') title = correct_str(elem.find('h3').text) recipe_page = parse(urllib.request.urlopen(href)).getroot() photo_url = recipe_page.xpath('//img[@class="rec-photo"]')[0].get('src') print('\nName: |', title) print('Photo: |', photo_url) 

Console - Results:

 Traceback (most recent call last): Name: | Crab-Stuffed Deviled Eggs File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1240, in do_open Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg h.request(req.get_method(), req.selector, req.data, headers) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request self._send_request(method, url, body, headers) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1128, in _send_request self.endheaders(body) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1079, in endheaders self._send_output(message_body) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 911, in _send_output self.send(msg) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 854, in send self.connect() File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 826, in connect (self.host,self.port), self.timeout, self.source_address) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 693, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 732, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:/Users/In/Dropbox/parser/test.py", line 27, in <module> recipe_page = parse(urllib.request.urlopen(href)).getroot() File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen return opener.open(url, data, timeout) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open response = self._open(req, data) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open '_open', req) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain result = func(*args) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1268, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Users\In\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1242, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed> Process finished with exit code 1 

    1 answer 1

    In the simplest case, getaddrinfo failed occurs if it is impossible to find an address for the provided URL. For example, " http: // google ", "localhos", " http://slkdfj.com "

    For a full picture you can read: https://msdn.microsoft.com/en-us/library/windows/desktop/ms738520%28v=vs.85%29.aspx , http://man7.org/linux/man- pages / man3 / getaddrinfo.3.html depending on the OS used

    For example, this is what MSDN says about this error: https://msdn.microsoft.com/en-us/library/windows/desktop/ms740668%28v=vs.85%29.aspx#WSAHOST_NOT_FOUND

     No such host is known. The name is not an official host name or alias, or it cannot be found in the database(s) being queried. This error may also be returned for protocol and service queries, and means that the specified name could not be found in the relevant database. 

    In more complex cases, it all depends on the network settings, firewall.

    • Thanks for the links, but I only understand English. I understand from what you wrote, you need to open the port, the question is what? still need to register something in the python 'a code? - Kill Noise
    • No need to open ports, if not closed them before. Looking for ports 80 (HTTP) or 443 (HTTPS). If the problem is not in the network settings (here you can not guess remotely), then the problem is the wrong page address in the variable href. - m9_psy
    • script worked yesterday, today launched and gave an error. certainly not a mistake in the address of the page variable href - Kill Noise
    • Here is the question then what could be the problem? - Kill Noise
    • So far 2 reasons have been suggested - wrong address and problems with the network or firewall. The first reason can be waived only if you print the address in the console and make sure that it is guaranteed to be correct. You can automate this — there is the pypi.python.org/pypi/validators library. With the second reason, everything is much more complicated, because the configuration of your OS is unknown, but if you can go online with this machine and with this user, then I have no solutions. Network problems can be checked by running the urllib.request.urlopen ("h_ttp: //google.com") line - m9_psy