The byte sequence is decoded into characters incorrectly.

Question

There is an example where we first encode a sequence and then decode it. But in this case, the output is not the expected 'A I', but the different results (depending on the encoding) are not one of which is not correct. Tell me how to fix?

secret='A я' bit_list = '' byte_array = [] for byte in secret.encode('cp1251'): print(byte) for bit in bin(byte)[2:].zfill(8): bit_list += bit if len(bit_list) == 8: byte = int(bit_list, 2) print(byte) byte_array.append(byte) bit_list = '' print(byte_array) for byte in byte_array: print(str(chr(byte)))

strawdog strawdog 3.149 one four 18 · Accepted Answer · 2018-11-12T18:55:52

Probably because your string is decoded incorrectly. Problems, as I understand it, arise with the lowercase letter 'I'. So here you have it U + 044F

 print(ord('я')) # 1103

In cp1251, “I” has code 255, which is incorrectly interpreted by the chr function, for example (cp1252):

 chr(255) -> 'ÿ'

Therefore, I would advise you to do explicit decoding:

 secret="A я" a = bytearray() b = bytearray() a.extend(secret.encode('utf8')) b.extend(secret.encode('cp1251')) print(a) print(b) print(a.decode('utf8')) print(b.decode('cp1251'))

At the exit:

 bytearray(b'A \xd1\x8f') bytearray(b'A \xff') A я A я

And in general, it is not clear to me why you are doing a loop with filling byte_array, if you have already received this byte in for byte in secret.encode('cp1251')

The byte sequence is decoded into characters incorrectly.

1 answer 1

More articles: