I want to unzip the file (*. Zip). It contains files whose names are in Russian. Here is a design

procedure TWorkThread.ExctractZip(aArchFile: string; aPath: string); var zZip: TZipFile; begin zZip := TZipFile.Create; try zZip.ExtractZipFile(aArchFile, aPath); finally FreeAndNil(zZip); end; end; 

As a result, after unzipping, the names of these files are transformed into a Çéè«¡Γα«½∞ αÑßΓ«αá¡δ.pdf

How to overcome it?

  • And the contents of the files at the same time normal, without distortion? Delphi Unicode? - kami
  • Yes, without distortion. - gregor
  • What was the archive created for and what version of delphi? - Alekcvp
  • I can not know what and how the archive was created. He just is. Delphi XE8 / Win 7 - gregor
  • What is the locale on the computer (Russian, English)? - androschuk

3 answers 3

Unfortunately, the ZIP format is not too strictly standardized (for example, Unicode was officially standardized only in 2007 - in the PKWARE 6.3.2 specification). The problem is not only in the standard, but in the innumerable number of programs that save the names as they please - which, in general, is not surprising with this specification.

In particular, there are several options for storing file names with characters that are beyond the scope of ASCII:

  1. ANSI
  2. OEM
  3. CP437 (DOS Latin US)
  4. UTF-8
  5. UTF-8 with flag
  6. Additional (extra) field $ 7075 - PKWARE / Info-ZIP standard
  7. Extra (extra) field - I forgot the numeric code from another program

Awful hodgepodge. Worst of all, there is often no way to determine the method of storing names, except for manually specifying people (the first four ways from the list above are programmatically indistinguishable).

Play around with the UTF8Support flag : if it helps, it's good; if not, you need to either look for another unpacker or another packer.

  • 2
    the crooks in question indicate that the archive was created using the OEM codepage (cp866) and unpacked using ( standard for zip ) cp437 encoding: 'Çéè«¡Γα«½∞ αÑßΓ«αá¡δ.pdf'.encode('cp437').decode('cp866') == 'АВКонтроль рестораны.pdf' . If it is not necessary to support ancient prehistoric zip-unpackers, then when creating an archive, encode file names using bit 11 and utf-8 encoding (UTF-8 variant with the flag in the response). - jfs

You need to change the standard zip.pas module. Tested on Delphi 10.2.

 function TZipFile.TBytesToString(B: TBytes): string; var E: TEncoding; begin if CyrillicSupport then //условие, нужна ли поддержка кириллицы, можно убрать E := TEncoding.GetEncoding(866) else if FUTF8Support then E := TEncoding.GetEncoding(65001) else E := TEncoding.GetEncoding(437); try Result := E.GetString(B); finally E.Free; end; end; function TZipFile.StringToTBytes(S: string): TBytes; var E: TEncoding; begin if CyrillicSupport then //условие, нужна ли поддержка кириллицы, можно убрать E := TEncoding.GetEncoding(866) else if FUTF8Support then E := TEncoding.GetEncoding(65001) else E := TEncoding.GetEncoding(437); try Result := E.GetBytes(S); finally E.Free; end; end; 

    I am not an expert on delphi, but you can try standard decode-encode with different encodings. You make up a dictionary of Russian letters in advance, and in each attempt to determine the encoding you chase the resulting name by letter through the dictionary. As soon as they found a coincidence - voila. That if the problem with the name. Also, for example, in python there is a library chardet , which determines the encoding of the file itself, and not its name.

    • Unfortunately, this is not an answer to the question (not a solution to the problem), but a comment. - Kromster
    • Comment implies an answer to it. My solution exists independently of the reaction and, most importantly, solves the problem. You got cracks and, comparing it with a dictionary, discarded and moved on to the next encoding. It seems to me that this, on the contrary, is the answer, and not a comment at all. You are a more experienced community member, so if you insist, I will delete the answer or postpone it in the comments. - Mae
    • I will not insist, just a note - Kromster