Multithreaded web page parser in Python

Question

There are about 80000 links written to the database with a product. I cycle through each link and read the data and select the characteristics I need for the product. It takes a lot of time.

How do I implement a simple multi-thread scheme for extracting information? I searched the Internet and found little.

Nikolay Demin Nikolay Demin 49 7 · Accepted Answer · 2016-12-01T14:07:53

try Pool

from multiprocessing import Pool def get_page(link): .... links = ("link1", "link2".....) pool = Pool() pool.map(get_page, links) pool.close() pool.join()

Nikolay Demin

49 7

one
You can use threads instead of processes for IO tasks. Instead of pool.close() , better with ThreadPool() as pool: in case of an exception is thrown. pool.join() is not needed here. For 80_000 links, it is worth imap_unordered() call and write errors. . It may be more convenient to use concurrent.futures if re-failed links to the pool are sent (submit (), as_completed ()). - jfs
those. instead of multiprocessing, you need multiprocessing.dummy. - Nikolay Dyomin
Right. Or, obviously, ThreadPool (it's the same thing) - jfs

|

Multithreaded web page parser in Python

1 answer 1

More articles: