JSON string conversion

Question

I pass the POST data with ajax and save it in an array ($ arr):

user_id_from => 42 user_id_to => 43 date_time => 2012-08-19 23:58:51 subject => Почему? message => так.. obj_identifier => o19

then I call the json_encode ($ arr) method; Here is what he brings to me as a result when receiving a response in a client script:

 ... "\u041f\u043e\u0447\u0435\u043c\u0443?","\u0442\u0430\u043a.." ...

If I use Latin instead of Cyrillic, everything is OK. Page encoding is UTF-8 without BOM. If you decode this line and, for example, display it on the page where the ajax request is sent:

 var_dump("<h1>json:</h1><pre>",json_decode($arr),"</pre>");

then it shows everything is fine. However, I cannot decode it where I get the answer, because in the client script it is retrieved as request.responseText :

 var jData = JSON.parse(req.responseText);

Probably, there is a solution, but it is unknown to me: (I would be very grateful for the help!

That's exactly why I use the good old xml as server responses - saving traffic ...) Either rewrite the place of the answer, or write your function decoding from escaped unicode.
It looks like responseText is expected in UTF-8, and in fact it is Unicode, encoded in accordance with RFC4627 (Json).
If you do not take surrogate pairs, then the decoding algorithm in Unicode (16-bit) is obvious.
Each "character" (it is either 1 byte, or 6 in the form \\ uHHHH) is a UCS code.
Go on the line, read the "character" (get a 16-bit integer), translate it (encode) to UTF-8 (immediately into the result string (from 1 to 3 bytes)).
The bottom line is that such a pair encodes a 31-bit (?) Integer ... That's all, the limit has been reached.
At the time, I also asked myself this question: [Cyrillic in json_encode] [1].

Community spirit ♦ one · Answer 1 · 2012-08-20T19:35:57

Actually, escaping in this case comes from conservative and backward compatibility considerations, Jon Skeet commented on this question well here.

In my personal opinion, not a bad idea. Moreover, judging by RFC4627, this JSON valid, which means it must be parsed.
With another interesting discussion on the topic can be found here.

@ Indifferent JSON with non-escaped Unicode characters is also valid JSON , so the phrase about saving traffic in comparison with XML not very clear.

srgg67 srgg67 305 2 five 26 · Answer 2 · 2012-08-20T22:37:14

Thank you all. After walking on the countless topics devoted to the problems of character conversion when working with JSON, I did not find an unambiguous solution to the problem. But I found out what it was in a particular case - namely, that among several connected (require) files there were those that had UTF-8 encoding with BOM (I note that I use DW, and it has such the peculiarity - if you switch from a project that has a different encoding, it can be a little bit worse there). After removing the BOM everything worked fine. I hope this experience will be useful for colleagues.

PHP, having detected some mismatch in the encoding, decided to "bring everything to its common denominator."
And by the way - how does it understand the encoding of what got out there when it was indicated “c BOM”?
Judging by the text in your question "\ u041f \ u043e \ u0447 \ u0435 \ u043c \ u0443?", ... the input encoding utf-8 was determined and the text was successfully recoded in Unicode, and here the output in JSON is represented by Unicode characters.
The BOM in utf-8 is encoded 0xef, 0xbb, 0xbf and corresponds to the Unicode character 'ZERO WIDTH NO-BREAK SPACE' (\\ ufeff in Json).

JSON string conversion

2 answers 2

More articles: