How to get the file (Cyrillic in the name) from FTP?

Question

Python 3.5, ftplib

ftp.nlst() get file names? And how to bring them to normal appearance?

 filelist = ftp.nlst() - получаем список имен файлов print (filelist[3]) - выводим имя 4-ого файла и получаем иероглифы

@jfs >>> print(ascii(filelist[6])) '\xc8\xed\xf1\xf2\xf0\xf3\xea\xf6\xe8\xff.pdf'

Thanks for the detailed answer)

To eliminate problems with displaying non-ASCII strings in your environment, show print(ascii(filelist[3])) to help with debugging.
@jfs >>> print (ascii (filelist [6])) '\ xc8 \ xed \ xf1 \ xf2 \ xf0 \ xf3 \ xea \ xf6 \ xe8 \ xff

Accepted Answer · 2016-05-13T17:38:40

Initially (RFC 765, 959), FTP only 7-bit ASCII supported, RFC 2640 extends support to other encodings, RFC 3659 specifies that commands such as MLST can return paths either in UTF-8 encoding, or it can be an arbitrary byte porridge - With some exceptions such as CRLF (new line, b'\x0d\x0a' ), single IAC Telnet ( b'\xff' ).

On POSIX, file names can be an arbitrary sequence of bytes with the exception of b'\x2f' and b\x00' (slash and zero).

In Python 3, ftplib formally uses latin-1's default encoding, which allows decoding an arbitrary sequence of bytes in Unicode.

Q: How does ftp.nlst () get file names?

ftp.nlst() returns a list of text (Unicode) strings.

Q: And how to bring them to normal appearance?

If you know that the server uses a unique encoding for the file names in your case (probably, utf-8 , if the output of the feat command shows it) and there are no unrepresentable names (PEP 383) , then the correct names can be obtained by decoding:

 filename = filename.encode(ftp.encoding).decode(your_encoding)

You can pass the 'surrogateescape' error handler in .decode() to support non-representable names (so that you can restore the original bytes without loss, while at the same time allowing you to print the representable names in "normal form").

Judging by the result of ascii() , in your case, the ftp server returns the names in cp1251 encoding (ANSI codepage on Russian Windows):

 >>> print(ascii(filelist[6])) '\xc8\xed\xf1\xf2\xf0\xf3\xea\xf6\xe8\xff'

In this case, your_encoding='cp1251' to get the original name:

 filename = filelist[6].encode(ftp.encoding).decode('cp1251') # -> 'Инструкция'

In order not to re-encode in vain, you can set before calling ftp.nlst()
ftp.encoding = 'cp1251' , if it is known that all file names in this encoding are representable. Then ftp.nlst() will immediately return correctly decoded names.

How to get the file (Cyrillic in the name) from FTP?

Python 3.5, ftplib

1 answer 1

More articles: