What is the name of this encoding?

\320\240\321\203\321\201\321\201\320\272\320\270\320\271\321\202\320\265\320\272\321\201\321\202 

Are there any services (or libraries) transcoding, for example, in utf8 (in C / C ++ libraries)?

  • 2
    I bet this is an octal representation. - VladD
  • five
    Here is a test program: string source = @ "\ 320 \ 240 \ 321 \ 203 \ 321 \ 201 \ 321 \ 201 \ 320 \ 272 \ 320 \ 270 \ 320 \ 271 \ 321 \ 202 \ 320 \ 265 \ 320 \ 272 \ 321 \ 201 \ 321 \ 202 "; var split = source.Split (new [] {'\\'}, StringSplitOptions.RemoveEmptyEntries); var bytes = split.Select (s => (byte) Convert.ToInt32 (s, 8)); var result = Encoding.GetEncoding ("UTF-8"). GetString (bytes.ToArray ()); gets the result string "Russiantext". So you have a UTF-8 encoded string, encoded bytes are represented in octal. - VladD
  • one
    @VladD, only this program seems to work only for the correct data format. Sometimes it will probably throw exceptions, but how to tie it to a specific place (format error) in the input data? - avp pm
  • one
    @avp: problems can be this: 1. Invalid input data format (the string does not beat on a sequence of numbers and backslashes). This is caught as an error in Convert.ToByte , you need to manually bypass split ( foreach (var p in split) { ... ). You can restore the byte number by counting in the source line the required number of backslashes. (This is relatively slow, but the error recovery procedure has the right to be slow). 2. Bytes do not add up to the correct encoding (invalid UTF-8): Encoding has overloads that return specials. character in place of unrecognized characters. - VladD
  • one
    @VladD, probably with any analysis such a program on Sharpe will still be shorter than on C. For example, this / * Convert something like \ 040 \ 377 ... (escaped 3 octal digits) to the string char Input: const char * input, terminates nil or isspace () with Returns: output filled with bytes encoded into input (without trailing nil) * uend byte address in input where you stopped * / int binostr (const char * input, char * output, int omaxsize, char ** uend); I have less than 20 lines failed. - avp 9:52 pm


1 answer 1

It looks like this “encoding” is called “Octal Escape Sequences”.

This term is used in the Bash Prompt HOWTO , here and in many other Google links.

  • one
    Yeah, and the encoding (after the broadcast) is utf8 - alexlz