While parsing the site, throws a 403 error:

raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden 

I saw the "solution" with "user agent" but it does not solve (I even have such an error on the site in the browser).

It is necessary exactly the iron solution in the code - in case of 403 errors (preferably the rest) restart the code again.

The problem is that except that neither HTTPError nor urllib.error.HTTPError nor urllib.HTTPError nor urllib.error catches . Unlike ValueError TypeError IndexError with which everything works.

At the beginning of the code imported library with errors from urllib.error import URLError, HTTPError "

A specific question: how to "catch" this error 403?

  • It is very strange that it does not work except with a certain error. Did you recheck everything? Maybe you catch a mistake not there, it happens - approximatenumber
  • I'm new. I tried different options as indicated above. In IDLE, none of these options is highlighted as an “available” error, unlike IndexError and similar ones. In the English segment of the network I read something about the versions of python and urlib + rise. I will quote verbatim: NOTE: Python 2.x urllib version will receive 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it will not get the exception. You can confirm that by the following code: print (urllib.urlopen (url). Getcode ()) # => 403 What was meant by the last code example is not clear, and where to write it and how it will help - Amaroc
  • By the way, tried with urllib2 ? from urllib2 import URLError, HTTPError . In the extreme case, if nothing happens, catch the error simply with the help of except Exception as e . And inside you can handle the error. - approximatenumber
  • thank! And you can explain more about it: except Exception as e. - what is it? How does it work? For urllib2 I do not worth it index error swears. I have anaconda python 3.5 - Amaroc
  • In the documentation, this is what they write: Handling Exceptions urlopene raises URLError when it cant handle a response (though it can be used as a ValueError, TypeError etc. may be raised). HTTPError is the subclass of URLError raised in HTTP URLs. The exception classes are exported from the urllib.error module. - Amaroc

3 answers 3

Something like this:

 from urllib.request import urlopen import time while True: try: with urlopen(URL) as f: print(f.read()) # Прошло без ошибок, выходим break except Exception as e: print(e) # Ждем 30 секунд перед повтором запроса time.sleep(30) 

It is possible further - to limit the number of repetitions, to catch and handle individually different exceptions, to increase the timeouts between repetitions for repeated errors.

  • Yes, they advised already: except Exception as e: print (e) But I absolutely cannot understand the logic of this code. What is he doing? Do I bring it to him? That is, if this code (in my opinion) to enter it “when outputting“ Any ”absolutely error - will print (e) start printing? I added it, urllib.error.HTTPError: HTTP Error 403: Forbidden and it goes (after print (e) needless to me actions to restart the function. - Amaroc
  • one
    The construction of except - catches an exception, Exception - a basic exception that allows you to catch any exception, inside the except describes the actions when catching an exception. In my example, the error text is simply written to the console and the program is delayed for 30 seconds in order to try to download again after - gil9red
  • Thanks for the detailed explanation, I get it. But the fact is that I tried the similar code "except Exception as e:" with further instructions, but the error Urllib.error.http didn’t affect it either - it popped up, and there were no "prints" or actions further. Apparently it is not initialized as a basic error. And in English commentaries, by the meaning, I understood that this is the way it is, in the version of urllib for python 3+. I think the other two answers (with similar) code should work. - Amaroc
  • one
    @Amaroc sorry, I lied to you. BaseException - the main exception. Confused with c #, which has base Exception gil9red

If you have the opportunity to use urrlib2 , catch the error code urllib2.HTTPError . For example:

 import urllib2 import time try: do_something() except urllib2.HTTPError as e: if e.code == 403: print(e) time.sleep(TIMEOUT) do_something() else: raise 

Or catch the error code urllib.error.HTTPError :

 from urllib.error import HTTPError import time try: do_something() except HTTPError as e: if e.code == 403: print(e) time.sleep(TIMEOUT) do_something() else: raise 
  • In python3 there is no module urllib2 - gil9red
  • @ gil9red well, the missing module is far from a problem ( pip install urllib2 ). - approximatenumber
  • And anyway, pip will not install utllib2 on python3, which the author has - gil9red
  • @ gil9red I missed an edited author comment about python35. - approximatenumber
  • @approximatenumber thank you very much! Tomorrow I will test, I will mark the correct answers (approached) as it should be! - Amaroc

HTTPError works:

 #!/usr/bin/env python3 from urllib.request import urlopen from urllib.error import HTTPError try: urlopen('http://httpbin.org/status/403') except HTTPError as e: assert e.code == 403 else: assert 0, 'never happens' 

To repeat the max_attempts request once if 403 Forbidden HTTP status is received:

 for _ in range(max_attempts): try: response = urlopen(url) except HTTPError as e: if e.code == 403: last_error = e continue # try again raise # allow other errors to propagate up the stack else: # success break else: # no break: all attempts failed raise last_error # raise last error