How to save special characters (°, $, ¥, powers of a number) in JSON?

When parsing data (using python, scrapy), there are many specific characters that are not displayed in a human-like form.

For example, the temperature value is 1021 ° C, at the output I get

>>> response.xpath('path').extract() u'1021 \xb0C' >>> print(response.xpath('path').extract()) 1021 °C 

And when forming a JSON object, I get

  [{ ..."temp": "1021 \u00b0C", ... }] 

Need

 [{ ..."temp": "1021 °C", ... }] 

How can I achieve a humanoid look in JSON? That characters were displayed as is.

  • one
    json standard such characters are prohibited, they must be encoded. Of course, you can walk through the json-text yourself and convert \u but then this json is not a fact that they will swallow the functions that should work with it - Mike
  • 2
    If you convert this json into an array with any standard function and load it, then everything will be decoded in the array. And if json itself is saved in the database, what difference does it look like there - Mike
  • one
    and what are you going to save in the database? how do json convert to db format? - Mike
  • one
    You parse json before loading into the database than you will? What kind of library function do you take? So, with json decoding, any functions will of course turn \u into the characters themselves - Mike
  • 2
    @Mike do not mislead the person. json of course supports these characters. json is a text format that is defined in Unicode terms. Screened sequences such as \uxxxx allow arbitrary json text to be sent using ASCII encoding, but of course, it is not necessary to use these sequences (except for certain exceptions). close to the topic . - jfs

1 answer 1

If you use the standard json library and do not want to see the encoded characters as a result of the conversion, you can use the additional parameter ensure_ascii function (also available in the dump function):

 >>> print(json.dumps('1021 °C', ensure_ascii=False)) "1021 °C" 

As far as I understand, JSON permits the use of arbitrary Unicode characters in strings. Here is what I found in the specification :

This is a list of marks that can be behaved, quotation mark, reverse solidus marks and the control characters (U + 0000 through U + 001F).

What after the free translation is read as

Any Unicode characters can be placed inside quotes ("), with the exception of characters that must be escaped: quotes ("), backslash (\) and control characters (from U + 0000 to U + 001F)

However, before using such a JSON write mode, check that the library used for parsing JSON works with Unicode characters.