Storage and convenient access to a large amount of data on python

Question

Hello. The following problem arose: where to store and how to provide quick access to a sufficiently large data table in Python?

Baseline data - with 4 million. rows and 10 columns.

Initially, plan to stuff this whole thing in .pkl like this:

import pickle infile=open('big.txt','r') flag=1 count=0 data=[] while flag==1: row=infile.readline() if row=='': flag=0 print('export done') break data.append(row.split('\t')) outfile=open('big_dump.pkl','wb') pickle.dump(data,outfile) print('dump done')

However, such code overloads the RAM, probably because it stores the data list in memory. Which way to look, which library to store to choose?

I thought about pytables, but there is not enough Russian documentation. The problem is that after saving the data, you need the ability to call them from a file by key.

etki etki 33.2k 2 gold marks 46 silver marks 71 bronze badge · Answer 1 · 2014-07-08T05:51:56

which storage library to choose?

PostgreSQL, MySQL, Oracle, MS SQL Server, MariaDB, or at most SQLite, it may still be that there is a better approach for NoSQL, something like MongoDB, Cassandra.

Question: what will work faster-mongodb or pytables with its HDF5-buns?

Storage and convenient access to a large amount of data on python

1 answer 1

More articles: