Here is the script, it passes all categories and all pages ... Yesterday I also read this article from a hacker, I decided to familiarize myself with a python, I added it to the existing one ...
#!/usr/bin/env python import urllib import re f = open('mails.txt','ab') url = 'http://otvet.mail.ru' u = urllib.urlopen(url) page = u.read() urlPattern = r'<li><a href="(.*?)" title="' urls = re.findall(r'<li><a href="(.*?)" title="',page, re.DOTALL|re.MULTILINE) for (categoryUrl) in urls: x = 1 while x <= 50: url = "http://otvet.mail.ru" + categoryUrl + "open/?pg=" + str(x) print(url) u = urllib.urlopen(url) page = u.read() emailPattern = "[0-9a-zA-Z_\-\.]+@[0-9a-zAZ\.]+ru" compiledPattern = re.compile(emailPattern) unic = uniq = list(set(compiledPattern.findall(page))) unic.remove("--Rating@Mail.ru") unic.remove("Rating@Mail.ru") x += 1 for (address) in unic: print(address) f.write(address + '\r\n') f.close
One line is clearly superfluous, look for yourself.