I pass the POST data with ajax and save it in an array ($ arr):

user_id_from => 42 user_id_to => 43 date_time => 2012-08-19 23:58:51 subject => Почему? message => так.. obj_identifier => o19 

then I call the json_encode ($ arr) method; Here is what he brings to me as a result when receiving a response in a client script:

 ... "\u041f\u043e\u0447\u0435\u043c\u0443?","\u0442\u0430\u043a.." ... 

If I use Latin instead of Cyrillic, everything is OK. Page encoding is UTF-8 without BOM. If you decode this line and, for example, display it on the page where the ajax request is sent:

 var_dump("<h1>json:</h1><pre>",json_decode($arr),"</pre>"); 

then it shows everything is fine. However, I cannot decode it where I get the answer, because in the client script it is retrieved as request.responseText :

 var jData = JSON.parse(req.responseText); 

Probably, there is a solution, but it is unknown to me: (I would be very grateful for the help!

  • what version of php? - FLK
  • That's exactly why I use the good old xml as server responses - saving traffic ...) Either rewrite the place of the answer, or write your function decoding from escaped unicode. - Indifferently
  • PHP Version 5.3.3 - srgg67
  • one
    It looks like responseText is expected in UTF-8, and in fact it is Unicode, encoded in accordance with RFC4627 (Json). In essence, there is a “mix” of UTF-16 and ASCII. If you do not take surrogate pairs, then the decoding algorithm in Unicode (16-bit) is obvious. Each "character" (it is either 1 byte, or 6 in the form \\ uHHHH) is a UCS code. Go on the line, read the "character" (get a 16-bit integer), translate it (encode) to UTF-8 (immediately into the result string (from 1 to 3 bytes)). With surrogate fumes more trouble. Read about them yourself. The bottom line is that such a pair encodes a 31-bit (?) Integer ... That's all, the limit has been reached. - avp pm
  • one
    At the time, I also asked myself this question: [Cyrillic in json_encode] [1]. [1]: hashcode.ru/questions/37998/… - ling


2 answers 2

  • Actually, escaping in this case comes from conservative and backward compatibility considerations, Jon Skeet commented on this question well here.
  • In my personal opinion, not a bad idea. Moreover, judging by RFC4627, this JSON valid, which means it must be parsed.

  • With another interesting discussion on the topic can be found here.

  • @ Indifferent JSON with non-escaped Unicode characters is also valid JSON , so the phrase about saving traffic in comparison with XML not very clear.

    Thank you all. After walking on the countless topics devoted to the problems of character conversion when working with JSON, I did not find an unambiguous solution to the problem. But I found out what it was in a particular case - namely, that among several connected (require) files there were those that had UTF-8 encoding with BOM (I note that I use DW, and it has such the peculiarity - if you switch from a project that has a different encoding, it can be a little bit worse there). After removing the BOM everything worked fine. I hope this experience will be useful for colleagues.

    • one
      @ srgg67, good research. Roughly clear. PHP, having detected some mismatch in the encoding, decided to "bring everything to its common denominator." Only other parts of the system were not ready for this. - avp
    • And by the way - how does it understand the encoding of what got out there when it was indicated “c BOM”? Or just can not define it and goes crazy? - srgg67
    • Judging by the text in your question "\ u041f \ u043e \ u0447 \ u0435 \ u043c \ u0443?", ... the input encoding utf-8 was determined and the text was successfully recoded in Unicode, and here the output in JSON is represented by Unicode characters. Although, the beginning of the line you do not. The BOM in utf-8 is encoded 0xef, 0xbb, 0xbf and corresponds to the Unicode character 'ZERO WIDTH NO-BREAK SPACE' (\\ ufeff in Json). I can't add anything more sensible. - avp