There is a request

SELECT articles.id, articles.name, articles.date_add, GROUP_CONCAT(DISTINCT category.name ORDER BY category.type DESC, category.name SEPARATOR ", " ) AS catname FROM articles INNER JOIN articles_categories ON articles.id = articles_categories.id_articles LEFT JOIN category ON category.id = articles_categories.id_categories GROUP BY articles.id ORDER BY articles.date_add 

Id query plan enter image description here

What indexes should be created. Something I can not figure out. Sorting greatly slows down, but it is impossible without it

  • In the articles table, make the cat_id field of yours look like, or you have not categories but tags - Serge Esmanovich
  • For what? An article may fall into several categories. And that would not store the id through "," and the articles_categories table is created. Well, I think by analogy with tags it turns out - ipbortnikov
  • Categories have a different meaning than tags, as well as links are built differently, plus the site can have both categories and tags at the same time - Serge Esmanovich
  • Well, the task remains the same. One article, many categories. And the problem is in sorting (ORDER BY articles.date_add), if you remove it, the execution time is reduced by several times. How can I reduce it by the index - ipbortnikov
  • @ipbortnikov Try to make a composite index on articles (id, date_add) . And it is not clear why articles_categories (1) is rigidly glued, and category (2) by LEFT. Can you have a situation that there is a record in (1) and there is no record for it in (2)? - Mike

1 answer 1

Indexes across the date_add field date_add not help, since sorting occurs after grouping. That is, a new table is sorted in memory for which there is no index by date.

In your case, you need to look for other approaches to increase productivity. For example:

  1. Get rid of DISTINCT in GROUP_CONCAT . In your case, this is superfluous, and the sampling performance can greatly influence.
  2. Do not immediately select all the data from the table. If there are several million entries in the tables, it is unlikely that you need them all at once. Add a condition to reduce the amount of paged data.
  3. Remove the sort from the query. You can sort it in an external script and it can be much faster.
  4. Remove the grouping from the query. You can also group data in an external script. Without grouping, you can use indexes and your request will be processed very quickly.
  5. Remove categories from the query and group with them. Categories can be added in an external script. This will significantly increase the query performance.
  6. If nothing helps or does not fit, then you can make a caching table and get data already from it.

The optimization problem needs to be solved in a complex way - simply adding an index may not help.

  • Section 1. distinct in group_concat, it is not distinct on request, it is fast. Sections 3 and 4 - external scripts are usually on some kind of php - it will be slower to do the same, and if mysql is on a separate server (as on a part of hosting), it will also drive bulk data over the network - Mike
  • 3 and 4 - in due time for the experiment, the data were unloaded from the database into memory and further processing was carried out in memory. It turned out much faster than contacting the database each time. There was not much data, but mysql was already processing it with difficulty. Some kind of analog cache. The author of the post did not provide any details about the environment, so you can try paragraphs 3 and 4. - slava
  • Clause 4 - with the union of several consecutive lines into one without problems to cope even PHP. - slava
  • They will not go in a row if they are not sorted by request - Mike
  • So if you remove the grouping of categories, then the sorting by the index will be performed very quickly. Then we sort by date and id and get the rows for grouping in a row. Everything will work very quickly. - slava