Relational Database Design

Question

For example, there is a table Car . Fields - id , type , name . Type can be many different. And for each Car by type may be additional data. And may not be.

Suppose there is a type big . For it you need to specify another year . And for type light specify shop and address . It turns out that Car must be in the same table, and all have id , type and name . But in certain situations, they should also have additional. data.

Option 1 - add all required columns to the Car table. It turns out, if we consider our example - id , type , name , year , shop , address . But for type big the shop and address fields are always empty. And for type light year will always be empty. And if for each type there will be ~ 10 - 100 add. fields, records with ~ 100 fields will be obtained, ~ 90 of which are empty.

Option 2 - the main info is stored in Car . For more create CarBig , and CarLight . But in this case you cannot select all the necessary data at once with one request, since Initially, you do not know which table to merge with.

Option 3 - design as in option 2. For requests to do a left join for all additional tables. But if we assume there is a type middle , which doesn’t need a separate table at all, then in such a case we will have to create a table as well. CarMiddle . And it will always be empty. And if we have type middle1...100 , then for all we will have to create tables. Well, to unite with all, respectively, too.

Those. the idea is that there is general data and there are additional data that exist only for certain records. And the question is how best to organize it all. Are there any other options? And if not, which one is the most acceptable?

Option 2. You also have a type field, it can be seen initially from which table to merge.
Option 3. No need to create an additional table where it is not needed.
@Sergey, yes, in the type field you can see which table will be merged, but in this case you need an additional query.
(as an example, select by id 1. And in this case you need to merge with the table in the same query. It seems like it is impossible ..)

Answer 1 · 2016-04-16T20:34:44

Collided with the same architecture. The essence was as follows: there is a market of scales, there are different scales (automobile, railway, medical). Depending on the subcategory of the scales, there are different properties (for example, for the medical, the “cup type” field was added). The decision was the following: all properties (division price, weight limit, bowl type, calibration type, etc.) were brought into a separate directory type entity. Further, from the subcategory (automobile, medical) there is a one-to-many relationship to the essence of the "property of the subcategory" (there are 2 keys in it - the subcategory ID and the property ID). In turn, the properties of the subcategory are connected one-to-many with the essence of "product data", which still contains the goods from the product and the field "value" (the value of this property itself, which we take from the directory of all properties)

Answer 2 · 2016-04-17T03:50:00

Maybe you should look in the direction of the vertical storage of attributes. For each data type, create a property sheet. I will show an example for strings.

Simplified option:

 StringProps(CarId, Type, Value)

Then the query will look something like this:

 SELECT C.Id, C.Type, MAX(CASE WHEN S.Type='shop' THEN S.Value ENS)shop, MAX(CASE WHEN S.Type='address' THEN S.Value ENS)address, MAX(CASE WHEN S.Type='year' THEN S.Value ENS)year FROM Car C LEFT JOIN StringProps S ON S.CarId = C.Id GROUP BY C.Id, C.Type

Or if you need a specific property, then:

 SELECT C.Id, C.Type, S.Value shop FROM Car C LEFT JOIN StringProps S ON S.CarId = C.Id AND S.Type = 'shop'

There is a slightly more complicated option, but more reliable. Create two tables, properties and property types.

 StringProps(CarId, TypeId, Value) StringDesc(Id, CarType, Name)

Then the query will look like this:

 SELECT C.Id, C.Type, MAX(CASE WHEN SD.Name='shop' THEN S.Value ENS)shop, MAX(CASE WHEN SD.Name='address' THEN S.Value ENS)address, MAX(CASE WHEN SD.Name='year' THEN S.Value ENS)year FROM Car C LEFT JOIN StringProps S ON S.CarId = C.Id LEFT JOIN StringPDesc SD ON SD.CarType = C.Type AND SD.Id = S.TypeId GROUP BY C.Id, C.Type SELECT C.Id, C.Type, S.Value shop FROM Car C LEFT JOIN StringProps S ON S.CarId = C.Id LEFT JOIN StringPDesc SD ON SD.CarType = C.Type AND SD.Id = S.TypeId AND SD.Name='shop'

The meaning of the second approach is that we will not be able to stuff a rubbish in the properties. And we can only those properties that are listed in StringDesc. Of course, if you create the necessary foreign keys.

You can read in more detail A. A. Tenzer DB - object storage .

EAV en.wikipedia.org/wiki/… If this is not the same, but very similar.
Although the implementation of the complexity or how much does not cancel.
If you write StringProps (CarId, Type, Value), then there will be StringProps (CarId, AttributeId, Value) and you can immediately see what the point is.

AlexDmitriev AlexDmitriev 171 four · Answer 3 · 2016-04-19T18:01:49

If you do not need to make a selection by additional fields, and the data itself is then processed by an application, you can collect this additional data in, for example, a JSON object and save it in the meta field of the Car table.

 id, type, name, meta 1, CarBig, CarName, { attr1: value1, attr2: value2}

The meta field can be either TEXT type or JSON type for MySQL 5.7.8 or higher. In the second case, you can sample the meta field.

Answer 4 · 2016-04-18T12:35:31

In a single query, you can combine as many tables as you want. Somewhere already gave such a hint.

  select c.*, b.*, l.* --c.id, c.name, c.type, b.year, l.shop, l.address from Car c left join CarBig b on b.car_id = c.id left join CarLight l on l.car_id = c.id where c.id = 1

getting the result of such a query, look at c.type :

if it is big , then we use the b.* : b.year fields b.* l.* not needed (and, in a good way, there are all the fields == null)

if c.type is ligh then we use l.* : l.shop , l.address part b.* ignored (and also all the fields from this part == null).

All that is possible is collected, and then, based on the type obtained, only the necessary one is used.

It works well with not a very large number of additional. tables and do not plan to add new tables for new characteristics.
If the list of characteristics is large and will be replenished, then pay attention to other answers.

Relational Database Design

4 answers 4

More articles: