Hello! The problem is in converting an array of bytes into a stream (Russian letters are replaced with question marks)

On the socket I get a message of totalByte size and write it into the byteMsg byte array. For further work, I need to convert this message to Stream.

If you convert an array of bytes into a string - everything is fine, Russian letters are displayed normally.

When I convert to stream, Russian letters are replaced with question marks. For check I transfer back to line and I bring to a log.

string text1 = Encoding.Default.GetString(byteMsg); text1 = text1.Substring(0, totalByte); Stream stream = new MemoryStream(byteMsg, 0, totalByte); stream.Position = 0; StreamReader reader = new StreamReader(stream); string text2 = reader.ReadToEnd(); log.Debug("text 1 = " + text1); log.Debug("text 2 = " + text2); 

I see in the logs:

 text 1 = русские буквы text 2 = ??????? ????? 

    1 answer 1

    You use the StreamReader constructor without explicitly specifying the encoding. Here is a quote from MSDN :

    This constructor sets the UTF8Encoding as the encoding, initializes the BaseStream property using the stream parameter, and sets the size of the internal buffer to 1024 bytes.

    You receive data using Encoding.Default (which, as far as I know, it is not recommended to do at all), which returns (again, a quote from MSDN )

    Gets the encoding for the current ANSI codepage of the operating system.

    And the result, of course, you and watch

    UPDATE

    @masuhorukov , there is no conversion from byte [] to Stream. Stream is just a stream of bytes, and you externally bytes the same and stuff, what kind of conversion can there be? You don’t need to stream data in a Stream through a MemoryStream.

    Assume that the encoding on the Windows-1251 server that matches (again, presumably) with your Encoding.Default (because you most likely wrote the same on the server). When you simply decode bytes into a string (in the first two lines of the code), then everything is fine, since the same encoding is used for encoding and decoding. However, then you put the array of bytes received using the Windows-1251 encoding into the Stream , which in turn works with the UTF8 encoding by default, and when you read this data from the Stream'a using ReadToEnd , the Stream translates the bytes into the string using UTF8 encoding. Now you see a mismatch?

    • The fact is that when converting from an array of bytes to a stream, the encoding is not specified anywhere. And there are already incorrect characters in the stream. In general, this message is in the format of protobuff and further after deserialization and parsing of fields, I see the wrong characters. Actually, therefore, I thought that the problem was in converting from byte [] to Stream - masuhorukov
    • @masuhorukov, updated the answer. The comment is not included - Donil