Good evening HashCode Chan ^^

Faced the following problem:
I need to write UTF-8 encoded strings to a text file
The program creates an XML file, storing it in this way.

WriteLn(XmlFile, ansitoutf8('<field name="Field1">привет</field>')); 

However, the result is no UTF8, and when opening a file through a browser, we get errors:

 Ошибка синтаксического анализа XML: некорректно Адрес: file:///G:/adverts2.xml Строка %№_строки%, символ 22: <field name="Field1">       </field> ---------------------^ 

what am I doing wrong?

  • Declaration xml indicated? <? xml version = "1.0" encoding = "UTF-8"?> - Yura Ivanov
  • Yes, I did. and the browser shows that this page is encoded with UTF-8 - teanPCh
  • The page may be in UTF, but the text in the file for some reason is not UTF. - Dex
  • I checked, even without specifying encoding, the browser displays correctly, autodetection works. Generated file on d7. The hex editor shows double-byte Russian letters, i.e. all OK. By the way, which browser gives this error? - Yura Ivanov
  • firefox 9.xx - teANYCH

2 answers 2

thanks to everybody, you're free

 str.SaveToFile('G:\Test.xml',Tencoding.UTF8); // где str - переменная типа TStringlist 
  • 2
    And earlier in Google issue it was difficult to find it yourself? - Dex

Open the file in the hex editor and see how the wrong text is encoded. I suspect that markers are inserted (http://en.wikipedia.org/wiki/Byte_order_mark). Look at the first byte - it did not find it in the symbol table and substituted the code. Which, apparently - 0xffff - and this is not quite a normal character for utf-8. "uffffufffdufffdufffdufffdufffdufffd" is generally a bad sequence for utf-8, or the browser has distorted it.

WriteLn generally may not work with the utf-8 string. I understand that they are used to it and it is convenient, but try using TFileStream and rewriting with it (the entire project is not obligatory - you can make yourself a demo).

Also check that you don't write the address of the variable instead of the content itself :)

  • The ansitoutf8 () function simply works normally only with the first 128 characters of ascii. therefore, I approached the problem in a different way and used the TStringlist method - savetofile, in which you can choose the type of encoding, which in turn normally accepts all characters. - teanYCH
  • The function works correctly - it converts characters with codes greater than 128 according to the current encoding of the system. But if it is wrong to use ... And I did not say that you need to look in a notebook. It is necessary to look in the hex editor. then it will be understandable. He just does not distort. - KoVadim