In general, I am trying to pull lines from an xls file and write them into a dictionary. When accessing the dictionary, I get these lines in unicode format. I tried to directly specify the encoding

streets =[] excel_file = xlrd.open_workbook('rivals.xls', encoding_override="cp866") sheet = excel_file.sheet_by_index(0) row_number = sheet.nrows if row_number > 0: for row in range (0, row_number): streets.append(str(sheet.row(row)[0]).replace('text:','').replace("'",'')) print streets 

It does not help, and also does not plow with other encodings. I tried to specify utf-16le, cp1251, cp1252. Nothing helps. How can I get a normal Russian text to be recorded, and not this rubbish incomprehensible, since I then use the values ​​of the dictionary to insert into the web page, and it spoils me. Python version 2.7 on Linux

Added code.

I tried to do through decode

 if row_number > 0: for row in range (0, row_number): streets.append(str(sheet.row(row)[0]).replace('text:','').replace("'",'').decode('unicode-escape')) 

It also did not work

Here is my xls file https://dropmefiles.com/NwGmc I want to make a dictionary from it, like ['Vladivostok, 100th anniversary of Vladivostok Avenue, 153', 'Vladivostok, Aleutskaya Street, 4', etc.]

Noticed now such a thing.

Before using decode, the received data was with double slashes, after with single digits

Did so:

 streets =[] excel_file = xlrd.open_workbook('rivals.xls', encoding_override="UTF-8") sheet = excel_file.sheet_by_index(0) row_number = sheet.nrows if row_number > 0: for row in range (0, row_number): streets.append( sheet.cell_value(row, 0) +" " + sheet.cell_value(row, 1)) 

Did not help

Dictionary code:

 def add_address_from(): if len(streets) > 0: street = streets[0] addr_from.clear() addr_from.send_keys(street) sleep(2) addr_from.send_keys(Keys.ARROW_DOWN) sleep(2) addr_from.send_keys(Keys.ENTER) del streets[0] sleep(1) 
  • cp866 you tried cp866 ? - Jazzis
  • tried it. nothing changed. This is still the case: ['u \\ u0412 \\ u043b \\ u0430 \\ u0434 \\ u0438 \\ u0432 \\ u043e \\ u0441 \\ u0442 \\ u043e \\ u043a, 100 - \\ u043b \ \ u0435 \\ u0442 \\ u0438 \\ u044f \\ u0412 \\ u043b \\ u0430 \\ u0434 \\ u0438 \\ u0432 \\ u043e \\ u0441 \\ u0442 \\ u043e \\ u043a \\ u0430 \\ u043f \\ u0440 \\ u043e \\ u0441 \\ u043f \\ u0435 \\ u043a \\ u0442, 153 ' - Alexander Gninenko
  • Can you put an example Excel file on any file sharing service and give an example of what you want to get at the output? - MaxU

1 answer 1

In short, everything is working fine for you, if you read and write a bit:

 import xlrd try: excel_file = xlrd.open_workbook('rivals.xls', encoding_override="UTF-8") except UnicodeDecodeError: print "Unicode Exception" exit(); except: print "Exception" exit(); sheet = excel_file.sheet_by_index(0) row_number = sheet.nrows for row in range (0, row_number): print sheet.cell_value(row, 0) +" " + sheet.cell_value(row, 1) 

the sheet.cell_value method (reading from a cell) outputs the value as expected, that is, with recoding (as indicated by the second argument of the file open function), but the sheet.row method displays the entire line from Excel without transcoding (did not look inside the code, but Apparently, since the Excel line contains including service characters, this confuses the pre-divider and it gives the string as is).

  • did not help, only gives out now with single slashes and not with double ones - Alexander Gninenko
  • Rewrote the answer - Alexander Chernin
  • did not help. Added code in the main message - Alexander Gninenko
  • If you print with a print, as you said, then it displays normally, but when you add a dictionary and then if you print the dictionary, again Unicode displays - Alexander Gninenko
  • @AlexanderGninenko this is normal, when outputting a dictionary this is how it should be. Explain what this does not suit you - andreymal