I am writing email parser

class CustomSMTPServer(smtpd.SMTPServer): def process_message(self, peer, mailfrom, rcpttos, data): mailto = str(rcpttos[0]) msg = email.message_from_string(data) # parser subject subject = msg.get('Subject') # parser body if msg.is_multipart(): for part in msg.get_payload(): if part.get_content_maintype() == 'text' and part.get('Content-Disposition') == None: msg_body = part.get_payload(decode=1) print(msg_body) else: msg_body = msg.get_payload() send_email(mailfrom, mailto, subject, msg_body) return server = CustomSMTPServer(('192.168.1.35', 25), None) asyncore.loop() 

Everything works well, but only in Latin. When in the body of the letter the Russian text climbs this:

 b'\xf1\xe0\xec\xee\xe5 \xf2\xee' 

Help to adapt the script for the Russian text.

  • Try converting to unicode with decode ('utf-8'). But in general, you have quite valid strings, just figure out the encoding. In general, your encoding in the letter headers should come and should turn out something like .decode ('latin-1'). Encode ('utf-8'), for example - FeroxTL

1 answer 1

It was solved like this: print (bytes (msg_body) .decode ('cp1251'))

  • as a minimum, you should not rigidly prescribe the encoding, try get_charset() , get_content_charset() methods, example . Also, bytes(msg_body) looks suspicious: if you want to get the text / plain parts of an email message, then you can .walk() method or iterators as shown in the example. - jfs