The bottom line is that we need to keep such a document in ourselves and work with it (search by parameters, editing, etc.).
If the structure of the document does not change during the operation of the application, i.e. your process is as follows - you have agreed on the structure of the document coming in the request and in case of changing its structure, release a new version of the product that will work with the new version of the request. If so, then this property is suitable for you and RDBMS. If the data format may be different with each request, it is unlikely that a RDBMS would be a good option. RDBMS gives you more control over data integrity - it controls torture to write data of an incorrect type to the table cell, or incorrect data in which part of the data is missing. With regards to NoSQL DBMS, the document-oriented type is more suitable for you, since in essence, your request is a document in JSON format, which you most likely write in unchanged form in the database. NoSQL such as MongoDB works directly with data in JSON format (in fact, BSON is used), which allows you to greatly simplify the operation of writing data to the database and focus on developing tools for analyzing this data. What is essentially required from your project. Based on the above, if you choose an RDBMS, you will need to spend additional resources on developing code that will transform your data into a relational view, as well as on developing a relational representation of your data — a set of tables and relationships between them.
but something terrible turned out, about 200 tables and creepy joins in queries
With regards to the tables, I formulated a comparison above; in the case of a NoSQL database storing data in JSON, you will not need any serious additional effort to write data. In the case of a denormalized structure with a doc.-or. DBMS you have two solutions to the problem:
- Full denormalization - then each document will store all its constituent data within itself and obtaining and changing them will be a simple task. But you will encounter problems updating common entities for the entire set of documents, since each will contain copies of it.
- All external entities stored in separate documents and refer to them, the approach is similar to the approach of the RDBMS and in fact is an attempt to normalize the data. In this approach, the same difficulties will arise with "creepy joins in queries", but this will not be solved by means of the SQL language, but by combining the necessary collections of representing documents directly in the code.
In essence, the mix of these approaches will be optimal, you will see where it is easier for you to store data as aggregates and where to use links to common entities.
We also encountered the problem that the document itself and the entities associated with it often change (some fields and links to other entities are added and removed).
This problem, which is also related to the previous one, can be easily solved with the help of graph DBs, which are created for the purpose of easy management of many entities interconnected in a complex way. There are also mixed models that combine graph and document-oriented approach - OrientDB. This approach can in your case radically simplify the solution of the current problem. But it is important to understand that if your system acquires new functionality, this approach may not be optimal for the new functionality, and in this case you should add another system with the appropriate type to the system - multivariate persistence. I think OrientDB or a similar system combining graph and doc.-oriented approach will free you from the many problems associated with the inconsistency of your data and DBMS structure.
Therefore, question number 1: What would you advise, in principle, to store such data and work with them?
I would advise you OrientDB or equivalent. Yes, you have to learn a new technology. But it is extremely important to understand that the biggest problems appear when the project is already launched, settled and it is necessary to actively develop it. Due to the unsuccessful selection of the model at the beginning, as the complexity of the project grows, it becomes overwhelmed with problems like a snowball, to introduce / delete / modify a new field or table you will have to put an incredible amount of effort, just to modify queries that will affect this change . My conclusion is to choose the storage system that will be most natural for your task and focus on solving the problem - data analysis.
Also, unfortunately, there is the task of storing the history of changes for each document, including entities associated with it, it is necessary that the user can view the history of document changes (what, by whom and when changed), if desired, roll them back, and also see the list of all documents and their changes within the project (roughly speaking, this is the name of group changes)
Many NoSQL DBMSs support the built-in auditing mechanism - MongoDB, OrientDB, both with the purchase of a commercial license. But you can search for a solution from the community or write it yourself.
Summing up:
Based on the problem you described, the DBMS is suitable for you - a mixture of graph and document-oriented (optimally, storing data in JSON). Because you have all the data originally presented in JSON and have many links to each other. You gave an example of such a DBMS, you can look in the direction of alternatives.
Before choosing, I advise you to read a book (not large in size) - "NoSQL. New methodology for developing non-relational databases." Translation into Russian is somewhat crooked, but enough to understand what NoSQL DB can give you.