A file is received in which a JSON array is written. The file can be large (up to 200MB). It is necessary to check the array for compliance with the JSON format (and I would also like to produce formatting) and write the file to the storage. In memory, all 200MB do not want to load.

Tell me how to do it in parts, so to speak, to transfer data from one file to another, without loading the entire file into memory.

For example:

[ [ 1234, 1234, 1234, 465 ], [ 1234, 1234, 1234, 1234 ], [ 1234, 1234, 1234, 1234 ], [ 1234, 1234, 1234, 1234 ] ] 

Need to get.

 1234,1234,1234,465 1234,1234,1234,1234 1234,1234,1234,1234 1234,1234,1234,1234 
  • Can you explain what it means to "write a file in the repository"? - MaxU
  • @MaxU is now just a folder on the local computer. - Vetos

1 answer 1

With tasks of check (validation) and formatting of JSON not bad copes json.tool . There should be no problems with files up to 200MB ...

Using:

 python -m json.tool json_file.json > formatted_json_file.json 

PS if you still need to read in parts, I advise you to pay attention to json-streamer

  • Thank. json.tool apparently still loads the entire file into memory. - Vetos
  • I think json-streamer. fit me In a pinch, there are still solutions based on yajl. - Vetos