Serialization in Python. Why so many modules?

Question

I am not familiar with serialization. That's what I understood: There are packages that are convenient for json man, there are packages that are convenient for storage in Python - pickle . There are interlanguage packages - protobuf . Well, three modules should be enough?
When I opened PyPi with the 'serialization' search question, I received 4+ module pages. For example protobuf has 3 modules ... So 2 questions.
1. Why dill for example, not combined with pickle ?
2. Is it possible to manage with 3-4 standards for all occasions? If not, what is the reason for such a variety of modules?

Related question How is __repr__ different from __str__ ?
Because a lot of different situations in life happen: no size fits all

MaxU MaxU 52.2k 6 18 51 · Accepted Answer · 2017-11-12T13:00:32

In my opinion there is no point in fighting evolution and natural selection. Good, comfortable, efficient, etc. modules will force out those who lose to them.

An example from my working life:

About 2 years ago, I chose for myself, as it seemed to me, the ideal module - HDF5 (PyTables) for fast and convenient (de-) data serialization in Pandas . A little later, a new library appeared - Feather Format (Apache Arrow) , which is much faster than HDF5, but it lacks things like reading from a disk by index. Now, depending on the task, I choose either HDF5 (if I need to process data sets that do not fit in memory) or Feather if I need to quickly read / write a full set of data (which fit in memory).

It is possible that in the future in the Feather Format will add access to the index, or there will be another even cooler library and I will gladly switch to it.

Now I am learning Apache Spark and there is exactly the same situation - there is no single standard and something new constantly appears. Some of these new formats survive and supplant established "standard" formats, while others are simply dying out ...

If there was one unshakable format, then there would be no development and improvement.

Sergey Gornostaev Sergey Gornostaev 53.3k 6 28 66 · Answer 2 · 2017-11-12T11:55:21

Because the modules are written and published by PyPI are not just one organization, but many independent programmers from around the world. Each of them has its own vision of any area, including serialization. When a module with PyPI seems inconvenient or does not fully fulfill my needs, I write my own and publish it.

Well, this is not a github yet ... what approval of the "elders" should occur or not ...
Moreover, there are many long-obsolete modules that cannot be run in a modern environment.
This situation is in all public repositories for all commonly used languages.

Serialization in Python. Why so many modules?

2 answers 2

More articles: