Parser email messages

Question

I am writing email parser

class CustomSMTPServer(smtpd.SMTPServer): def process_message(self, peer, mailfrom, rcpttos, data): mailto = str(rcpttos[0]) msg = email.message_from_string(data) # parser subject subject = msg.get('Subject') # parser body if msg.is_multipart(): for part in msg.get_payload(): if part.get_content_maintype() == 'text' and part.get('Content-Disposition') == None: msg_body = part.get_payload(decode=1) print(msg_body) else: msg_body = msg.get_payload() send_email(mailfrom, mailto, subject, msg_body) return server = CustomSMTPServer(('192.168.1.35', 25), None) asyncore.loop()

Everything works well, but only in Latin. When in the body of the letter the Russian text climbs this:

 b'\xf1\xe0\xec\xee\xe5 \xf2\xee'

Help to adapt the script for the Russian text.

But in general, you have quite valid strings, just figure out the encoding.
In general, your encoding in the letter headers should come and should turn out something like .decode ('latin-1'). Encode ('utf-8'), for example

Anton Vorob'v Anton Vorob'v 125 8 bronze marks · Answer 1 · 2016-07-06T12:43:12

It was solved like this: print (bytes (msg_body) .decode ('cp1251'))

Anton Vorob'v

125 8 bronze marks

as a minimum, you should not rigidly prescribe the encoding, try get_charset() , get_content_charset() methods, example . Also, bytes(msg_body) looks suspicious: if you want to get the text / plain parts of an email message, then you can .walk() method or iterators as shown in the example. - jfs

|

Parser email messages

1 answer 1

More articles: