The storage architecture of related data in the database

Question

The site is supposed to post ads. All ads are divided into various categories (roughly as Avito).
To create a form is filled with the fields "name", "description" (typical for all ads) and the choice of categories. After selecting a certain category, an additional form with data is loaded (for example, for cars - color, mileage, etc., for real estate - the floor, house type, number of apartments, etc.).
Supposedly in the database there will be a common table for all Adverts ads:

id | name | description | category | type

in which the name, description of the advertisement and category id are stored, corresponding to the category in the table selected by the user and the type field to be divided into more detailed tables with data on the announcements. Those. for each form of the announcement there is a table, for example:
Adverts

 23 | Продам машину | "описание" | 4 | auto

AdvertAuto

 id | color| mileage | model 1 | red | 280000 | honda

The question arises, how to dock the data? It is clear that in the case of a single announcement, it is enough to simply press the desired table, but what to do when the selection is performed on all the ads?
It becomes necessary to join many tables at once, depending on the value of the type field.
Actually questions:
1) Is this architecture correct and are there any better solutions?
2) If this architecture is correct, then how to select multiple values? I saw requests using CASE - WHEN - THEN or with JOIN UNION, but how suitable are these requests for living on the combat server?

@u_mulder is very very slow and unsupported, nobody will thank you for such an architecture.
I suggest krown_loki to leave everything as it is and cache reference data, for example in Redis.
We read from the cache, if we don’t get there from the base and throw it into the cache, the next request will get it from the cache, simply and conveniently.
@krown_loki is too general a question, give a detailed answer to it for a long time.
If there are more than 5M records, the nosql database would go well, as we used to say solr, but there is a lot of things.
And in general, you can also use SQL - if you use a database cluster, if you optimize queries - imposing them as much as possible on the index.
It depends on what you like more - hemorrhoids with adminstvo, or hemorrhoids with low-popular NoSql solutions ..

Accepted Answer · 2019-01-18T09:17:56

There are several ways.

Entity – attribute – value model.
EAV allows you to add any necessary attributes at any time without changing the database schema. But working with such a model is quite difficult.
Create a separate column for each property. At the same time, everything is stored in one table.
This approach is often used to store different types of data. It is quite viable if there is not very much data (hundreds of thousands of lines are few).
In this case, in columns that do not belong to the entity being described in this row, null values will be stored. At first glance, this may seem inefficient, but many DBMSs support so-called sparse columns, as a result, everything becomes efficient.
I am not an expert on mysql, but the search shows that there is support for such columns in this database.
Use json or xml to store additional data.
I think there is nothing special to explain here. MySql supports both ways: json , xml .

In the labels you have two DBMS: mysql and postgres, from which it can be concluded that the choice of database has not yet been made. Then why not consider other options, namely: NoSQL?

1) I understand in the case of EAV - everything is rather cumbersome and difficult, and also unproductive.
2) In the case of storage in a single table, it is quite simple to organize a search filter by attributes, but how much will it cost because of the presence of Null columns?
3) This method removes a bunch of unnecessary columns, but how to organize a search filter ads for such a structure?
(as far as I know this is possible only in Postgresql in jsonb format columns, but I have never worked with this and therefore it is difficult for me to decide on a solution).
Regarding NoSQL, it should also be used for such individual cases, the whole project can not be hung on NoSQL?
Theoretically, you can do everything on nosql, but in practice it will probably be better to use both this and that.

Ivan Dudarev Ivan Dudarev 101 3 · Answer 2 · 2019-01-18T09:11:30

1) True to a little less than completely, otherwise you would not have problems with joins.

2.2) It will work, but it will be difficult to maintain, much less develop.

You can organize it like this:

categories

 id | name

properties

 id | category_id | name

adverts

 id | name | description | category_id

property_values

 property_id | advert_id | value

Ivan Dudarev

101 3

I understand this is the same EAV? - krown_loki
Yes, he is the most ... - Ivan Dudarev
How with such a storage architecture to make a request to display information on a single entity? I mean, because of the large number of attributes, the result of the requests will not fit into one line, how can we disassemble it so that it can be displayed as a "product card" in html? You mentioned the problem with JOINs, however, with EAV in your example, the table of categories, attributes, and attribute values joins the same way. - krown_loki
Why do you fundamentally display information on a single entity in one query? You can already collect an entity in code, just like that. No, you can make 10 joins of course, but they are better not to do at all. - Alexander Goncharov
@GoncharovAleksandr in principle, because It is possible to view a list of ads with minimal information (name, short description + photo), and extended information on one entity (ad), which includes all the information (name, short description + attributes of a particular ad) - krown_loki

|

Gleb Starukhin Gleb Starukhin one · Answer 3 · 2019-01-18T20:30:42

Your decision is acceptable and, I think, will allow you to achieve good performance. The only feature that you have to use dynamic SQL to create queries with one JOIN. The order of requests is:

select type from adverts where id =: id
the result of the query is stored in a variable, say $ additional_table
compile dynamic sql query "select a. *, b. * from adverts a join $ additional_table b on a.id = b.advert_id where a.id =: id"

The latter will display ad headings and additional information in one line. Time for parsing and execution should also be minimal for obvious reasons. In due course all variants of dynamic request (variants of values $ additional_table) will appear in a cache and it is not necessary to sort anything at all, only to fulfill.

When selecting all the ads I see no reason to display the details. There will be 40+ columns to the right of the main data, most of which are empty. If you still want to, then you need to select all types of requested ads, and attach the corresponding tables with the LEFT JOIN method, i.e. repeat steps 1-3, but in bulk.

PS When using EAV, the product card can be obtained not as a set of columns, but as a set of rows, which, in my opinion, is more convenient when generating HTML.

PPS If the type field is identical to the name of the table (which is implied above), then the second form can be generated when creating an ad by polling the data dictionary.

The storage architecture of related data in the database

3 answers 3

More articles: