Python 2.7 reads Unicode strings

Question

In general, I am trying to pull lines from an xls file and write them into a dictionary. When accessing the dictionary, I get these lines in unicode format. I tried to directly specify the encoding

streets =[] excel_file = xlrd.open_workbook('rivals.xls', encoding_override="cp866") sheet = excel_file.sheet_by_index(0) row_number = sheet.nrows if row_number > 0: for row in range (0, row_number): streets.append(str(sheet.row(row)[0]).replace('text:','').replace("'",'')) print streets

It does not help, and also does not plow with other encodings. I tried to specify utf-16le, cp1251, cp1252. Nothing helps. How can I get a normal Russian text to be recorded, and not this rubbish incomprehensible, since I then use the values of the dictionary to insert into the web page, and it spoils me. Python version 2.7 on Linux

Added code.

I tried to do through decode

 if row_number > 0: for row in range (0, row_number): streets.append(str(sheet.row(row)[0]).replace('text:','').replace("'",'').decode('unicode-escape'))

It also did not work

Here is my xls file https://dropmefiles.com/NwGmc I want to make a dictionary from it, like ['Vladivostok, 100th anniversary of Vladivostok Avenue, 153', 'Vladivostok, Aleutskaya Street, 4', etc.]

Noticed now such a thing.

Before using decode, the received data was with double slashes, after with single digits

Did so:

 streets =[] excel_file = xlrd.open_workbook('rivals.xls', encoding_override="UTF-8") sheet = excel_file.sheet_by_index(0) row_number = sheet.nrows if row_number > 0: for row in range (0, row_number): streets.append( sheet.cell_value(row, 0) +" " + sheet.cell_value(row, 1))

Did not help

Dictionary code:

 def add_address_from(): if len(streets) > 0: street = streets[0] addr_from.clear() addr_from.send_keys(street) sleep(2) addr_from.send_keys(Keys.ARROW_DOWN) sleep(2) addr_from.send_keys(Keys.ENTER) del streets[0] sleep(1)

This is still the case: ['u \\ u0412 \\ u043b \\ u0430 \\ u0434 \\ u0438 \\ u0432 \\ u043e \\ u0441 \\ u0442 \\ u043e \\ u043a, 100 - \\ u043b \ \ u0435 \\ u0442 \\ u0438 \\ u044f \\ u0412 \\ u043b \\ u0430 \\ u0434 \\ u0438 \\ u0432 \\ u043e \\ u0441 \\ u0442 \\ u043e \\ u043a \\ u0430 \\ u043f \\ u0440 \\ u043e \\ u0441 \\ u043f \\ u0435 \\ u043a \\ u0442, 153 '
Can you put an example Excel file on any file sharing service and give an example of what you want to get at the output?

Accepted Answer · 2018-12-13T07:39:27

In short, everything is working fine for you, if you read and write a bit:

 import xlrd try: excel_file = xlrd.open_workbook('rivals.xls', encoding_override="UTF-8") except UnicodeDecodeError: print "Unicode Exception" exit(); except: print "Exception" exit(); sheet = excel_file.sheet_by_index(0) row_number = sheet.nrows for row in range (0, row_number): print sheet.cell_value(row, 0) +" " + sheet.cell_value(row, 1)

the sheet.cell_value method (reading from a cell) outputs the value as expected, that is, with recoding (as indicated by the second argument of the file open function), but the sheet.row method displays the entire line from Excel without transcoding (did not look inside the code, but Apparently, since the Excel line contains including service characters, this confuses the pre-divider and it gives the string as is).

did not help, only gives out now with single slashes and not with double ones
If you print with a print, as you said, then it displays normally, but when you add a dictionary and then if you print the dictionary, again Unicode displays
@AlexanderGninenko this is normal, when outputting a dictionary this is how it should be.

Python 2.7 reads Unicode strings

1 answer 1

More articles: