For the subsequent data analysis, I need to get json with api answers, weight, saved in text format ~ 2.7 megabytes for each. There are 200-250 thousand such files. The question is how to store and read. Please tell me the best in your opinion way.
Now my solution is to use the gzip module (Python 3.6) to compress one large file containing the json response in each line. Each line is written and read iteratively (plus at the end of the line break "\ n"), without requiring loading the entire file into the RAM. Thus, in terms of each file, 2.7 MB is compressed to 150 KB. The question is - is there a more convenient solution for read / write speed?
import gzip import json from requests import get with gzip.open('large_data.json.gz', 'wb') as outfile: while True: response = get(url).json() outfile.write(json.dumps(response).encode('utf-8') + b'\n')