There is a table with the goods (item) and a table with the colors of the goods (item_color). Links "one product to multiple colors." Tables are large, item is 1.5 million lines, item_color is 2.5 million. Type is InnoDB.

On the page of the online store you need to display filters by color, size of goods. Total filters - 6 pieces. To do this, I use similar queries:

SELECT DISTINCT item_color.id_color FROM item_color JOIN item ON item.id = item_color.id_item AND item.enabled=1 AND item.id_category BETWEEN 10100000 AND 10199999 

Server: 2 cores, 4GB of RAM. The request takes 6-8 seconds.

Item Table Indices:

 PRIMARY KEY (`id`), KEY `c` (`id_category`), KEY `e` (`enabled`), KEY `ce` (`id_category`,`enabled`), KEY `iec` (`id`,`enabled`,`id_category`), KEY `cei` (`id_category`,`enabled`,`id`) 

Item_color table indexes:

 PRIMARY KEY (`id`), KEY `ic` (`id_item`,`id_color`), KEY `ci` (`id_color`,`id_item`), KEY `i` (`id_item`) 

Explain:

enter image description here

Tell me, please, how to optimize the request? Maybe I chose the wrong approach to building filters?

  • First, delete the index e . Indexes with such selectivity only slow down the work. other indices are generally the same would not hurt to revise. for example, why do you need both ic and ci in the second table. obviously enough ci . And by the way, what is your id field in that table? In link tables, the own id field is not needed in 95% of cases. Here ci in this table is suitable for the role of the primary key - Mike
  • And yes, the request is somehow artificial, "give me the unique colors of the goods in categories in the range such that". Simply distinct strongly spoils request. And the applicability of the query results to zero - Mike
  • Specifically for this query, the ideal would probably be ce with fields in reverse. But this is for this particular, very artificial request. - Mike

3 answers 3

As I understand it, you need to get a list of colors in the category of published products to display the list of colors in the filter. I think with such a quantity of goods it really takes a lot of time, regardless of the indexes.

Try to split the query into several:

  1. Get the id of the published items in the category ( select id from item where ... ).
  2. By product id, get unique colors ( select distinct id_color from iitem_color where id_item in (...) ).

Try to figure out the speed of these requests first. If it is faster, then you can already think about optimizing the indexes under the join, or leave the retrieval of data with several requests.

If it fails faster, then you can cache the result for the category, for example, for a day. Then only once a day the visitor will wait 8 seconds (for one category). Or you can update the cache on cron, so that no one waited.

Or, in extreme cases, simply in the filter output a list of all available colors, regardless of whether there are such goods or not.

  • Thanks for the advice, divided the request into two. The first is performed in 0.1-0.2 seconds depending on the number of products in the category. The second query with IN (ids) is executed in 0.5 seconds (with 40k products in the category) and 4 seconds (with 250k products in the category). This is in principle tolerable, but there are 5 such filters. The total execution time will be around 20-25 seconds. Now I will try to experiment with indices. If it does not work out, then as a solution: for "heavy" categories I will simply display all colors, all sizes, etc. in the filter. - librown
  • Try all the same for heavy categories to implement caching. It will be a good compromise. - cronfy
  • @librown if the answer helped you, please accept it. - cronfy Nov.

I didn’t work with MySql, so I’ll give general tips:

To begin to evaluate the selectivity of the index. I see a lot of indexes.

It is possible that indexes are not needed for some columns and it is easier to scan the entire table.

Try to make

 item.enabled=1 AND item.id_category BETWEEN 10100000 AND 10199999 

in WHERE and peel in the order of the greatest probability of falling into a condition (What would once again not to see other conditions)

Further, what I see is a weak PC. It is likely that there is a bottleneck in the PC configuration, since I do not see anything criminal in the request itself.

    The plan shows that the index e is selected for the first table. And on it all 106113 active records get over. Specify the query using FORCE INDEX (USE INDEX) to use another key (I would put the "from" - by category)

    Composite keys with an enabled field, as I think, make sense only with a large proportion of disabled entries. But in this case, it is better to remove them from the operational tables / DB and transfer them to the archive.