You want to save a specific set of data in a CSV file. The file must be in cp1251 encoding.

Pseudocode (extra is removed):

#!/usr/bin/python3 # -*- Coding: utf-8 -*- import csv numbers = %Π·Π°ΠΏΡ€Π°ΡˆΠΈΠ²Π°Π΅ΠΌ_ΠΈΠ·_ΠΌΠΎΠ½Π³ΠΈ_Π½Π°Π±ΠΎΡ€_Π΄Π°Π½Π½Ρ‹Ρ…% report_filename = '/tmp/report.csv' with open(report_filename, 'w', encoding='cp1251', newline='') as csvfile: csv_file = csv.writer(csvfile, delimiter=';') for number in numbers: try: csv_file.writerow([number, %строка_с_ΠΊΠΈΡ€ΠΈΠ»Π»ΠΈΡ†Π΅ΠΉ%, %Π΅Ρ‰Ρ‘_ΠΎΠ΄Π½Π°_строка_с_ΠΊΠΈΡ€ΠΈΠ»Π»ΠΈΡ†Π΅ΠΉ%]) except Exception as msg: print(number, ': ', msg) continue 

The code works, lines where there are non-convertible characters are skipped, and you would not need to skip them. I get the encoding is set only when opening a file for writing. I tried various options with encode () and decode (), str.decode () in the third Python, in general, no, there was no success in general. How to make it so that it ignores invalid characters and correctly transcode utf-8 lines to cp1251?

Python 3.5.3

1 answer 1

In open (as in str.encode and str.decode ), in addition to the encoding argument, there are also errors indicating what to do with the problematic characters in the text:

  • replace - replace the problem character with some kind of placeholder (for Unicode encodings, this is , is it cp1251 easy ? );

  • ignore : skip the problem character.


 with open(report_filename, 'w', encoding='cp1251', errors='replace', newline='') as csvfile: 

Read more in the documentation: https://docs.python.org/3/library/functions.html#open

Example with str.encode for clarity:

 >>> print('Π­Ρ‚ΠΈΡ… →×← символов Π½Π΅Ρ‚ Π² cp1251'.encode('cp1251', errors='replace').decode('cp1251')) Π­Ρ‚ΠΈΡ… ??? символов Π½Π΅Ρ‚ Π² cp1251 >>> print('Π­Ρ‚ΠΈΡ… →×← символов Π½Π΅Ρ‚ Π² cp1251'.encode('cp1251', errors='ignore').decode('cp1251')) Π­Ρ‚ΠΈΡ… символов Π½Π΅Ρ‚ Π² cp1251