Hello. The following problem arose: where to store and how to provide quick access to a sufficiently large data table in Python?

Baseline data - with 4 million. rows and 10 columns.

Initially, plan to stuff this whole thing in .pkl like this:

import pickle infile=open('big.txt','r') flag=1 count=0 data=[] while flag==1: row=infile.readline() if row=='': flag=0 print('export done') break data.append(row.split('\t')) outfile=open('big_dump.pkl','wb') pickle.dump(data,outfile) print('dump done') 

However, such code overloads the RAM, probably because it stores the data list in memory. Which way to look, which library to store to choose?

I thought about pytables, but there is not enough Russian documentation. The problem is that after saving the data, you need the ability to call them from a file by key.

    1 answer 1

    which storage library to choose?

    PostgreSQL, MySQL, Oracle, MS SQL Server, MariaDB, or at most SQLite, it may still be that there is a better approach for NoSQL, something like MongoDB, Cassandra.

    • Question: what will work faster-mongodb or pytables with its HDF5-buns? - Oladyshek