Hello. There is a template rtf file. By means of Python3 I do replacement. Polish letters do not display in the required form. The output from the interpreter is correct. Thanks in advance for your help. Link to the file http://dropmefiles.com/GV7Va

#!/usr/bin/python # -*- coding: utf-8 -*- with open("test.rtf") as file_in: text = file_in.read() TEL_TAG = r'łacińskim' text = text.replace("TEL_TAG", TEL_TAG) with open("generated.rtf", "w") as file_out: file_out.write(text) print(TEL_TAG) 
  • create a minimal example of the input rtf file. Explicitly give the desired result and what your code prints instead. Look at the answer in the example of an rtf with different characters: Read Cyrillic from the Python 3 rtf file . And you can only guess: whether your source code is not stored in the encoding, or the local encoding used by open () differs from the encoding in which non-ascii unshielded characters in rtf are explicitly written (if rtf allows it - if does not allow, then TEL_TAG should be encoded using \' , \u- . - jfs
  • I solved the problem in this way. Polish values ​​and characters were driven into an RTF file in encoded form, that is, in one in which the file can correctly read them. For example, in the above code, the variable TEL_TAG = r '\ u322 \' 3faci \ u324 \ '3fskim' would output łacińskim correctly in the document. And I did not understand how to directly write Polish words to the file. Thank you for responding. - Serhii Yaroshevkyi
  • I was interested in the universal ability to save rtf files with the ability to read Polish letters correctly and save the formatting in the document. Then the next question follows: how to save the rtf file and in what encoding so that the specified values ​​are correctly output and formatting preserved. - Serhii Yaroshevkyi
  • Excuse me. Corrected. - Serhii Yaroshevkyi

1 answer 1

Here is an example rtf file that shows two ways how the word 'łacińskim' can be written:

 {\rtf1\ansi\ansicpg852\uc0 [\'88aci\'e4skim] [łacińskim] } 

The first word is written using shielded \'xx sequences, where xx is a byte written in hexadecimal, representing the character encoded in the cp852 encoding.

The second word is written directly in cp852 encoding. When saving an rtf file, make sure that the cp852 encoding is used. For example, on my system, if I copy this text and paste it into a text file, it will be saved in utf-8 encoding and in order to get the desired file, you need to execute the command additionally:

 $ iconv --from utf-8 --to cp852 polish.rtf | sponge polish.rtf 

When generating a file using Python, it is enough to explicitly specify the encoding:

 #!/usr/bin/env python3 from pathlib import Path text = Path('test.rtf').read_text(encoding='cp852') text = text.replace('LATIN_PLACEHOLDER', 'łacińskim') Path('generated.rtf').write_text(text, encoding='cp852') 

Note that the Python code is encoded in utf-8 (nothing needs to be changed).