You have an incorrect formulation of the problem.
How can I store such data in order to quickly and smoothly search by an arbitrary number of participants?
But at the same time
I need to store this data in the database (a specific engine is not important now - it will most likely change after some time - we believe that there are no joins or indexing of array fields) and perform a search on them.
The answer will depend on the type of database you select. The most "painless" option - keep everything in JSON and Filter to help you. But this is probably the easiest option, because there are still other data that will be inconvenient to store in such a form and painful to use.
When choosing NoSQL storage, you have almost complete control over the data and the sample, but watch out for the consistency of the data. But then you can use any Filter that selects the records you want. And if further in the project you will have an entity with a large number of connections, then you can torture yourself to follow them.
In the case of a relational database, everything is more complicated, because the complexity of the filter is constrained by relational algebra, but everything is in order with consistency and sampling by index.
The example filter can be implemented with something like: http://sqlfiddle.com/#!9/b891f
SELECT Conference.conf_id, COUNT(Conference.conf_id) as cnt FROM Conference JOIN Participants ON Conference.conf_id = Participants.conf_id AND (Participants.key, Participants.val) IN ( ('a', '79219998878'), ('b', '79219998877') ) GROUP BY (Conference.conf_id) HAVING cnt = 2
If something more complicated, then most likely the constraints of relational algebra will not allow you to do this. But from this situation, as you have suggested, there is a way out - to store your structure with JSON and impose a condition on the sample.
SELECT * FROM Conf WHERE JSON_CONTAINS(json_field, "79219998877", "$.a") AND JSON_CONTAINS(json_field, "79219998878", "$.b") -- или через виртуальные колонки -- WHERE json_field->"$.a" = "79219998877" AND json_field->"$.b" = "79219998878"
virtual speakers
So you get a hybrid of both worlds, but is only available in MySQL> 5.7. There you can even build indexes .
UDP: reminded about postgres, thanks Yes, there also has similar functionality with jsonb and you can also build requests across the field. See @> and <@ operators.
UDP: after reading the comments, it became more or less clear to me what was meant
At the moment, the project is sitting on a relational server, in the future it is planned to switch to a row-column storage (NoSQL is not only json-like storage), which involves preparing data for simple queries in a "flat" form, and in which I cannot search on the part of the associative array (only the full version). This raises the above-described question, which boils down not to finding workarounds, but to whether you can somehow present all this in a prepared form.
In this case, the "workaround" solution is its own implementation of the index on the array of documents. But the writing of "his" such an indexer is a very difficult task.
You cannot embed this indexer in your database. And its implementation in php / ruby / python will most likely be inferior in performance to complete enumeration in the database. So you have to write it in the "system" language and communicate via IPC / socket.
Then I see the way how to use it: when creating / deleting a document, you will transfer this document to the indexer, and it will change the index for this reason. Then when you need a sample, you make a request to him, and he quickly returns the ID of the elements that satisfy this request, and already with these IDs you climb into the database and select records. Then why reinvent this bike if you can pick up some MongoDB thread and use it the same way.
The following is an example for Mongo.
Documents stored better in the form of:
{ "_id" : ObjectId("571923c7e4b08c60be5228a4"), "id" : 1, "participants" : [ { "key" : "a", "value" : "79219998878" }, { "key" : "b", "value" : "79219998877" }, { "key" : "c", "value" : "79219998879" } ] } { "_id" : ObjectId("571923f0e4b08c60be5228a9"), "id" : 2, "participants" : [ { "key" : "a", "value" : "79219998877" }, { "key" : "b", "value" : "79219998878" } ] } { "_id" : ObjectId("57193370e4b08c60be522acb"), "id" : 3, "participants" : [ { "key" : "a", "value" : "79219998877" }, { "key" : "c", "value" : "79219998879" } ] } { "_id" : ObjectId("571933c2e4b08c60be522ad4"), "id" : 4, "participants" : [ { "key" : "a", "value" : "79219998878" }, { "key" : "b", "value" : "79219998877" }, { "key" : "d", "value" : "79219998873" } ] }
Make an index:
db.participants.createIndex({ "participants.key" : 1 , "participants.value" : 1})
And look for:
db.participants.find( { "participants" : { "$all" : [ { "$elemMatch" : { "key" : "a", "value" : "79219998878" } }, { "$elemMatch" : { "key" : "b", "value" : "79219998877" } } ] } } ).pretty()
Conclusion
{ "_id" : ObjectId("571923c7e4b08c60be5228a4"), "id" : 1, "participants" : [ { "key" : "a", "value" : "79219998878" }, { "key" : "b", "value" : "79219998877" }, { "key" : "c", "value" : "79219998879" } ] } { "_id" : ObjectId("571933c2e4b08c60be522ad4"), "id" : 4, "participants" : [ { "key" : "a", "value" : "79219998878" }, { "key" : "b", "value" : "79219998877" }, { "key" : "d", "value" : "79219998873" } ] }
If you make explain (), you will see that indexes are used.
"winning plan": { "inputStage" : { // INDEX SCAN!!! "stage" : "IXSCAN", "keyPattern" : { "participants.key" : 1, "participants.value" : 1 }, "indexName" : "participants.key_1_participants.value_1", "isMultiKey" : true, "direction" : "forward", "indexBounds" : { "participants.key" : [ "[\"a\", \"a\"]" ], "participants.value" : [ "[\"79219998878\", \"79219998878\"]" ] } }
And just use it as an external indexer.
Yes, there is an overhead projector, that for the sake of this indexer you will have to run a whole Mongu. But you can also transmit not a whole document, but only an ID and an array of participants. I think in other NoSQL databases there is also an indexer with the required functionality and, perhaps, they are "easier" to monga, you can use them.
If you really want it, you can dig into the source code of Mongi, understand how such an indexer works and rewrite it yourself. But, in my opinion, an overhead on Mongu is cheaper than selling a bicycle on steroids.