I am engaged in preprocessing more than 10 million links in Jupyter notebook on python, I would like to know the fastest way to check the correctness of the link. From those that managed to try:
import urllib.request def is_valid(url, qualifying=None): qualifying = min_attributes if qualifying is None else qualifying token = urllib.parse.urlparse(url) return all([getattr(token, qualifying_attr) for qualifying_attr in qualifying]) Parses the link in parts, works quickly, but gives such things:
is_valid('http://http://апревлупупц') True def is_valid(url): try: urllib.request.urlopen(url) return True except Exception: return False Opens each link, works fine, but plows very slowly.
PS Django in Jupyter does not work, and its libraries, respectively, also
http://http://апревлупупцis incorrect? I see a link to the site with thehttpaddress and the optional port number omitted — the sitehttp://http/very well exist on the local network, and the link you specify can work. - andreymal