How to read krakozyabry “ЇаЁўҐв”?

Question

There is a line "ЇаЁўҐўҐ" how to read it? Those. it needs to be converted from one encoding to another, how to do it in C #?

Addition: I answer the question why it is needed and where such lines are?
If you take the old floppy disks with files created in the last century in MS-DOS, then the file names are approximately like “aYo.txt”, as viewed in Windows.

Comments are not intended for extended discussion; conversation moved to chat . - Nick Volynkin ♦

Answer 1 · 2016-01-18T15:14:52

Specify the encoding when reading the contents of the file. Those. for reading ("transcoding" when reading) from 866, it is enough just to specify Encoding:

File.WriteAllText(@"c:\temp\test.txt", "тест!", Encoding.GetEncoding(866)); var text = File.ReadAllText("test.txt", Encoding.GetEncoding(866));

If you had a specific case, for example, you received already corrupted text as a string, then it is enough just to save it back to bytes indicating the wrong encoding, and read indicating the correct one:

 static void Main(string[] args) { string bad = "ЇаЁўҐв"; string good = Convert(bad, 1251, 866); } static string Convert(string source, int from, int to) { byte[] bytes = Encoding.GetEncoding(from).GetBytes(source); return Encoding.GetEncoding(to).GetString(bytes); }

True, this will only work if reading bytes in the wrong encoding (by a happy coincidence!) Turns out to be reversible. The following is an example of when this is not the case.

Concerning "recoding":

You are trying to fix the consequences, not the problem itself.

How does this problem arise:

You have an old file encoded in 866.
You read it into a string without specifying an encoding. The system does not find the BOM, and reads the file in the Encoding.Default Encoding.Default .
You are trying to "transcode read line".

Example:

 // создали старый файл с содержимым в 866 File.WriteAllText("test.txt", "тест!", Encoding.GetEncoding(866)); // Открыли без указания кодировки, увидели кракозяблы: Console.WriteLine(File.ReadAllText("test.txt"));

The solution you are trying to apply is "convert a string". Those. You hope the following code works:

 static void Main(string[] args) { // создали старый файл с содержимым в 866 File.WriteAllText(@"c:\temp\test.txt", "тест!", Encoding.GetEncoding(866)); // Открыли без указания кодировки, увидели кракозяблы: var text = File.ReadAllText("test.txt"); Console.WriteLine(text); text = Convert(text, 866, 1251); Console.WriteLine(text); } static string Convert(string source, int from, int to) { byte[] bytes = Encoding.UTF8.GetBytes(source); byte[] newBytes = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(from), bytes); string newStr = Encoding.GetEncoding(to).GetString(newBytes); return newStr; }

There is a weak point in this solution - it assumes, strings in .net are just a kind of byte set. Those. no matter in what form the line is read - it can be converted back into the same bytes from which it was read. In fact, it is not. The example above is non-working.

If you do not guess the encoding of the file when reading - it will not work to write back.

 File.WriteAllText(@"c:\temp\test.txt", "тест!", Encoding.GetEncoding(866)); var text = File.ReadAllText("test.txt"); File.WriteAllText(@"test2.txt", text);

Suddenly, this code produces two different files, although there was no "transcoding".

Well, if the encoding in the "wrong" code page was lossless, and it can be turned.
@PashaPash "File.ReadAllText (" test.txt ") ... Suddenly" - the fact is that you write 866, and read UTF8.
@Stack and the problem that you described in the question is precisely this.
Someone somewhere wrote a file in 866 (or another old encoding).
They read it with an indication of a wrong encoding (or no indication at all, and read it like utf-8) - they got cracks.
The essence of the answer - you must read in the correct encoding immediately, indicating it when reading.

Answer 2 · 2016-01-18T14:00:21

 string Convert(string source, int from, int to) { byte[] bytes = Encoding.UTF8.GetBytes(source); byte[] newBytes = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(from), bytes); string newStr = Encoding.GetEncoding(to).GetString(newBytes); return newStr; }

Using:

 string str = "Привет"; string result = Convert(str, 866, 1251); => ЇаЁўҐв string result2 = Convert(result, 1251, 866); => Привет

You can immediately get the bytes in the desired encoding - Encoding.GetEncoding(from).GetBytes(source) and do without calling Encoding.Convert

Answer 3 · 2016-04-15T13:22:06

There is a line "ЇаЁўҐўҐ". How to read it?

Your 'hi'. 1251 and 866 are both single-byte, support Cyrillic and cover the lossless code range if interpreted incorrectly (866-1251, 1251-866).

If you only need to read, you do not need to convert anything. It is enough to choose the correct code page for interpreting the text (as noted by colleagues earlier - you have cp866) and set it when reading an array of bytes or from a stream.

Options are listed above. That's just not Сonvert , otherwise you will get the same thing, since Convert produces a comparison, not a replacement of characters.

How to read krakozyabry “ЇаЁўҐв”?

3 answers 3

More articles: