I am not familiar with serialization. That's what I understood: There are packages that are convenient for json man, there are packages that are convenient for storage in Python - pickle . There are interlanguage packages - protobuf . Well, three modules should be enough?
When I opened PyPi with the 'serialization' search question, I received 4+ module pages. For example protobuf has 3 modules ... So 2 questions.
1. Why dill for example, not combined with pickle ?
2. Is it possible to manage with 3-4 standards for all occasions? If not, what is the reason for such a variety of modules?

  • one
    Related question How is __repr__ different from __str__ ? (look at the chart) - jfs
  • one
    general question from the field: why so many words? Because a lot of different situations in life happen: no size fits all - jfs

2 answers 2

In my opinion there is no point in fighting evolution and natural selection. Good, comfortable, efficient, etc. modules will force out those who lose to them.

An example from my working life:

About 2 years ago, I chose for myself, as it seemed to me, the ideal module - HDF5 (PyTables) for fast and convenient (de-) data serialization in Pandas . A little later, a new library appeared - Feather Format (Apache Arrow) , which is much faster than HDF5, but it lacks things like reading from a disk by index. Now, depending on the task, I choose either HDF5 (if I need to process data sets that do not fit in memory) or Feather if I need to quickly read / write a full set of data (which fit in memory).

It is possible that in the future in the Feather Format will add access to the index, or there will be another even cooler library and I will gladly switch to it.

Now I am learning Apache Spark and there is exactly the same situation - there is no single standard and something new constantly appears. Some of these new formats survive and supplant established "standard" formats, while others are simply dying out ...

If there was one unshakable format, then there would be no development and improvement.

    Because the modules are written and published by PyPI are not just one organization, but many independent programmers from around the world. Each of them has its own vision of any area, including serialization. When a module with PyPI seems inconvenient or does not fully fulfill my needs, I write my own and publish it.

    enter image description here

    • That is, no one moderates 100,000 + PyPi modules? Well, this is not a github yet ... what approval of the "elders" should occur or not ... - Vasyl Kolomiets
    • 2
      Moreover, there are many long-obsolete modules that cannot be run in a modern environment. And there are no fewer modules that were broken initially. And there are modules absolutely meaningless. This situation is in all public repositories for all commonly used languages. - Sergey Gornostaev
    • Sorry, I can not put a mark "answer" to both answers. They complement each other .... - Vasyl Kolomiets