I stand at a crossroads, I do not know what to choose, tell me, please.

There is a database with a table of products and some of their properties in separate tables, and more precisely:

good (~2k ЗаписСй) catalog (~50 записСй) form (4 записи) type (~100 записСй) color (10 записСй) sostav (5 записСй) sizes (~200) width (~50) 

DB schema

Accordingly, in each of these 11 (including the <-> relationship tables) tables, the Nth number of columns. In total, about 150 from all tables.

The question is how best to do it:

  1. 6 different queries with only the necessary JOIN and columns (1 for a selection of products and 5 for displaying filters) + JOINs and conditions if filters and / or sorting are selected
  2. The stored procedure from the query with all JOIN and all columns and the further processing of all this porridge in PHP (not at all optimal, in my opinion, but you never know)
  3. 1 request to the temporary table and further samples from it (still long, because the complete request itself is executed from 2 to 20 seconds) + if I'm not mistaken, a temporary table will be created for each open connection separately. Respectively 20 simultaneous users - 20 temporary tables from ~ 300k rows
  4. Create a table containing the result of sampling all JOINs, with a composite PK for all PK of incoming tables and indexes on the required columns and update it periodically. Say once per hour / 5hours / day.
  5. Option 4, but updating the data on the triggers (I don’t know yet whether it is possible in the trigger), the product data is not updated frequently, but triggers are called many times at a time ... Add 1 product to associate it with 5 sizes - 6 calls trigger, at least ... + types + colors ... (a manager can change / add 10-50 products per day)
  6. What is a normal option?

In any case, the minimum set of JOIN that should be in all queries:

 goods - собствСнно, Ρ‚ΠΎΠ²Π°Ρ€Ρ‹ catalogs - ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΈ Ρ‚ΠΎΠ²Π°Ρ€ΠΎΠ² (для ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ Π²ΠΊΠ»/Π²Ρ‹ΠΊΠ») size_type_1 - Ρ€Π°Π·ΠΌΠ΅Ρ€Ρ‹ (для ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠΈ остатков) 

Also, in each request there will be a calculation Π² Π½Π°Π»ΠΈΡ‡ΠΈΠΈ / Π½Π΅Ρ‚ Π² Π½Π°Π»ΠΈΡ‡ΠΈΠΈ + HAVING according to this calculation for filters (there should be only actual data)

That such a problem arose before me. It seems to be all in detail, as I could describe. I really hope for your help :)

  • What is the problem to run them one by one and see the execution time? - Jean-Claude
  • make xp procedure, run crown every 5 minutes and that's it. There should be no problems - Vfvtnjd
  • And you want the filters "live". Those. if a person has already chosen 1 filter for example size, then in the filter by color display only those colors that are with these sizes. If not, then the filters themselves are select from the reference table and that's it. And it can generally be separately cached and given ready. If the "live", then of course more difficult, it is necessary for the selected items to select. but this can be done without gluing, if the names are already cashed up, and the main query returns to your favorite id, they are in goods - Mike
  • @ Jean-Claude, the problem is that the data may change. On current data, everything can be ok, and then there will be problems. - RAMe0
  • @Vfvtnjd that is, you are for option 4, if I understood correctly? - RAMe0

1 answer 1

Thanks to all.

I stopped at a certain intermediate variant: a cache table containing the ID of all connected tables obtained through a complete JOIN sample of all tables.

Created redundant links and a combined index, over the most frequently used fields, on this table. The index turned out to be "redundant." For a table of ~ 10mb, the index is ~ 36mb. But such an index is needed. Compared option InnoDB + an index and MEMORY without an index. Even in memory, all requests were performed for more than 3 seconds, against <0.5 seconds in the table with indices.

The average query execution speed of almost any complexity with only the necessary JOINs, as a result, turned out to be ~ 0.003 seconds (full sampling, however, with any JOIN without conditions, ~ 0.7 seconds is performed, but it will not be used), versus 2-20 seconds with different options in the start post.

The only negative - the table will have to update the "pens" or on a schedule. I stopped at the "pens" option, that is, the cache table will be updated only after updating the remnants of the goods. Because even if the manager adds the product, until the remnants are updated, users will not find this product.

The final model came out this way, who cares:

enter image description here