How to quickly check the validity of links

Question

I am engaged in preprocessing more than 10 million links in Jupyter notebook on python, I would like to know the fastest way to check the correctness of the link. From those that managed to try:

import urllib.request def is_valid(url, qualifying=None): qualifying = min_attributes if qualifying is None else qualifying token = urllib.parse.urlparse(url) return all([getattr(token, qualifying_attr) for qualifying_attr in qualifying])

Parses the link in parts, works quickly, but gives such things:

  is_valid('http://http://апревлупупц') True

  def is_valid(url): try: urllib.request.urlopen(url) return True except Exception: return False

Opens each link, works fine, but plows very slowly.

PS Django in Jupyter does not work, and its libraries, respectively, also

Why do you think that http://http://апревлупупц is incorrect?
I see a link to the site with the http address and the optional port number omitted — the site http://http/ very well exist on the local network, and the link you specify can work.
Quote from the WHATWG specification : “A URL-port string must be zero or more ASCII digits.” The word “zero” hints that the absence of a port after a colon is normal.
Hmm, well, I meant the link is considered correct if it can be opened, especially when it comes to what product the customer has been viewing, and without a correctly opening link it is impossible to find out
Then the slow urllib.request.urlopen is the only possible option

How to quickly check the validity of links

0

More articles: