import urllib import os import re import smtplib from email.mime.text import MIMEText import time usedEmails = [] num = 0 while True: u = urllib.urlopen("http://otvet.mail.ru/") page = u.read() emailPattern = r"[0-9a-zA-Z_\-\.]+@[0-9a-zAZ\.]+\.[a-zA-Z]+" compliedpattern = re.compile(emailPattern) for address in compliedpattern.findall(page): if not address in usedEmails: 

How to make it collect soap in a simple text document?

    2 answers 2

    Here is the script, it passes all categories and all pages ... Yesterday I also read this article from a hacker, I decided to familiarize myself with a python, I added it to the existing one ...

     #!/usr/bin/env python import urllib import re f = open('mails.txt','ab') url = 'http://otvet.mail.ru' u = urllib.urlopen(url) page = u.read() urlPattern = r'<li><a href="(.*?)" title="' urls = re.findall(r'<li><a href="(.*?)" title="',page, re.DOTALL|re.MULTILINE) for (categoryUrl) in urls: x = 1 while x <= 50: url = "http://otvet.mail.ru" + categoryUrl + "open/?pg=" + str(x) print(url) u = urllib.urlopen(url) page = u.read() emailPattern = "[0-9a-zA-Z_\-\.]+@[0-9a-zAZ\.]+ru" compiledPattern = re.compile(emailPattern) unic = uniq = list(set(compiledPattern.findall(page))) unic.remove("--Rating@Mail.ru") unic.remove("Rating@Mail.ru") x += 1 for (address) in unic: print(address) f.write(address + '\r\n') f.close 

    One line is clearly superfluous, look for yourself.

      I do not rummage in python

      Before the loop ( for address in ... ) open the file for writing

      Accordingly, just at the end of your code write the address + "\n" .

      Googling gave something like this:

       f = open(r'addr.txt', 'w') for address in compliedpattern.findall(page): if not address in usedEmails: f.write(address . "\n") f.close() 
      • Writes that the name urlopen is not found what to do? - afagorn