Python3 Urllib Http error 403 - except

Question

While parsing the site, throws a 403 error:

raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

I saw the "solution" with "user agent" but it does not solve (I even have such an error on the site in the browser).

It is necessary exactly the iron solution in the code - in case of 403 errors (preferably the rest) restart the code again.

The problem is that except that neither HTTPError nor urllib.error.HTTPError nor urllib.HTTPError nor urllib.error catches . Unlike ValueError TypeError IndexError with which everything works.

At the beginning of the code imported library with errors from urllib.error import URLError, HTTPError "

A specific question: how to "catch" this error 403?

It is very strange that it does not work except with a certain error.
In IDLE, none of these options is highlighted as an “available” error, unlike IndexError and similar ones.
In the English segment of the network I read something about the versions of python and urlib + rise.
I will quote verbatim: NOTE: Python 2.x urllib version will receive 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it will not get the exception.
You can confirm that by the following code: print (urllib.urlopen (url). Getcode ()) # => 403 What was meant by the last code example is not clear, and where to write it and how it will help
In the extreme case, if nothing happens, catch the error simply with the help of except Exception as e .
In the documentation, this is what they write: Handling Exceptions urlopene raises URLError when it cant handle a response (though it can be used as a ValueError, TypeError etc. may be raised).
The exception classes are exported from the urllib.error module.

gil9red gil9red 31.9k four 24 69 · Accepted Answer · 2016-03-23T13:37:22

Something like this:

 from urllib.request import urlopen import time while True: try: with urlopen(URL) as f: print(f.read()) # Прошло без ошибок, выходим break except Exception as e: print(e) # Ждем 30 секунд перед повтором запроса time.sleep(30)

It is possible further - to limit the number of repetitions, to catch and handle individually different exceptions, to increase the timeouts between repetitions for repeated errors.

gil9red

31.9k four 24 69

Yes, they advised already: except Exception as e: print (e) But I absolutely cannot understand the logic of this code. What is he doing? Do I bring it to him? That is, if this code (in my opinion) to enter it “when outputting“ Any ”absolutely error - will print (e) start printing? I added it, urllib.error.HTTPError: HTTP Error 403: Forbidden and it goes (after print (e) needless to me actions to restart the function. - Amaroc
one
The construction of except - catches an exception, Exception - a basic exception that allows you to catch any exception, inside the except describes the actions when catching an exception. In my example, the error text is simply written to the console and the program is delayed for 30 seconds in order to try to download again after - gil9red
Thanks for the detailed explanation, I get it. But the fact is that I tried the similar code "except Exception as e:" with further instructions, but the error Urllib.error.http didn’t affect it either - it popped up, and there were no "prints" or actions further. Apparently it is not initialized as a basic error. And in English commentaries, by the meaning, I understood that this is the way it is, in the version of urllib for python 3+. I think the other two answers (with similar) code should work. - Amaroc
one
@Amaroc sorry, I lied to you. BaseException - the main exception. Confused with c #, which has base Exception gil9red

|

Answer 2 · 2016-03-23T13:55:57

If you have the opportunity to use urrlib2 , catch the error code urllib2.HTTPError . For example:

 import urllib2 import time try: do_something() except urllib2.HTTPError as e: if e.code == 403: print(e) time.sleep(TIMEOUT) do_something() else: raise

Or catch the error code urllib.error.HTTPError :

 from urllib.error import HTTPError import time try: do_something() except HTTPError as e: if e.code == 403: print(e) time.sleep(TIMEOUT) do_something() else: raise

@ gil9red well, the missing module is far from a problem ( pip install urllib2 ).
And anyway, pip will not install utllib2 on python3, which the author has
Tomorrow I will test, I will mark the correct answers (approached) as it should be!

Answer 3 · 2016-03-23T14:50:12

HTTPError works:

 #!/usr/bin/env python3 from urllib.request import urlopen from urllib.error import HTTPError try: urlopen('http://httpbin.org/status/403') except HTTPError as e: assert e.code == 403 else: assert 0, 'never happens'

To repeat the max_attempts request once if 403 Forbidden HTTP status is received:

 for _ in range(max_attempts): try: response = urlopen(url) except HTTPError as e: if e.code == 403: last_error = e continue # try again raise # allow other errors to propagate up the stack else: # success break else: # no break: all attempts failed raise last_error # raise last error

Python3 Urllib Http error 403 - except

3 answers 3

More articles: