There are two tables: Customers, Orders_Customers. I want to make a request, which of the customers did the last order this year

SELECT clients.*, orders.date FROM clients LEFT JOIN orders USING(client_id) WHERE orders.date >= '2015-01-01' GROUP BY client_id ORDER BY orders.date DESC; 

Indexes have tables. Everything would be fine, but on 100K clients and on 1M orders, the speed of fulfilling a request leaves much to be desired. I haven’t found anything better than entering the customer with the last order date, but can I optimize the query?

  • SELECT * FROM clients WHERE client_id IN (SELECT client_id FROM orders WHERE date >= '2014-01-01' GROUP BY client_id); performed much faster, but does not show the last order date. - ArchDemon
  • Indices tried to add? I would try to add to orders.date, client_id. - KoVadim
  • Indexes on client_id and date are - ArchDemon
  • The author does not listen to them. Here to you for the future citforum.ru/database/digest/dig_2412.shtml Believe me as a person working with the base of one of the largest companies in the Russian Federation. - IvanZakirov

5 answers 5

 SELECT o.m_d, c.* FROM (SELECT client_id, MAX(date) AS m_d FROM orders WHERE date >= '2014-01-01' GROUP BY client_id) AS o INNER JOIN clients AS c USING(client_id); 

So it happened very quickly and what you need. Thank you @IvanZakirov

    JOIN construction increases the processing time of a request with a large number of records (well, it depends on the conditions).

     SELECT clients.*, max(orders.date) -- вывести максимальную дату, она же последняя (желательно еще там время учитвать, вдруг в 1 день несколько клиентов брали заказы) FROM clients c, orders o WHERE c.id = o.client_id GROUP BY client_id ORDER BY orders.date DESC 

    or another option for the maximum order ID - it will also be the last one:

     SELECT clients.name, max(orders.id) -- вывести максимальную дату, она же последняя (желательно еще там время учитвать, вдруг в 1 день несколько клиентов брали заказы) FROM clients c, orders o WHERE c.id = o.client_id GROUP BY clients.name ORDER BY orders.date DESC 
    • Yes, actually my and your request give approximately the same time 9 - 11 seconds. That's a lot - ArchDemon
    • one
      Try to process the order table first. Pull out the maximum date. SELECT client.name FROM client c, (SELECT max (orders.id), orders.client_id) o WHERE c.id = o.client_id - IvanZakirov
    • The first option contains extra fields. See ONLY_FULL_GROUP_BY The second option groups by client_name, but does it say somewhere that the field uniquely identifies the client? - artoodetoo
    • In the wrong place and not with that measure try to pussy. I'm not interested. I gave the author information and you gave. What he thinks the right way and that uses. And all your advice, some kind of nonsense (in the past is nonsense :)) and especially the expression ".... the field uniquely identifies the client ...". A uniquely identifying field is a field used by the primary key, a unique non-repetitive value. Do you even read it yourself? and where does the grouping and the unique identifying field? And the best option is written in the comments. And he wrote only a recommendation based not on the first post of the question. Knowing the base ... - IvanZakirov
    • ... knowing the base query would be quite possible different and differently built. I repeat once again - I did not write a ready-made solution, but a recommendation to the approach of obtaining information from the database. And I am not going to argue with you anymore. Here are two tables having a primary key in one and a connection to another on the secondary. No composite indices, distinctions, etc. for the simplest query is not needed. And the guy is right with the orientation and wrote how he quickly began to work at the same time using the JOIN, even though I said no need. But I wrote a recommendation for the approach. - IvanZakirov

    Add in the client a link to his last order - see the difference. It really makes the server easier and reduces the response time.

    Added by:

    Solution (if the requests are not satisfied at the moment with the server that exists): transfer the old "unnecessary" (to this important request) data to the archive table, all the time work with the operational table (current year). For all other data consumers (who need everything) - make a View (view) and “hide” these old and operational data - that is, make a common interface that will hide the physical database structure, but will not affect the client’s functionality (the client will communicate with a view, but the one who wants to get the current data from a small tablet will get it directly).

    • So done now. But I don’t like to enter additional fields, if the information can be extracted and so. - ArchDemon 4:27 pm
    • @ArchDemon Well then there is nothing to store orders for all the years in one tablet. ) - garik
    • @garik don't be silly. The database was created to store information and no one divides it by nameplates if there are many records. It is only necessary to do the right approach. To create additional fields for the sake of the information that can be pulled out anyway - denormalization is unjustified. Normal server pull the desired row from the table with millions of records delov for 0.005 seconds. - IvanZakirov
    • @IvanZakirov nobody divides, are you sure? here we are talking about large tablets, and not presentational beautiful database structures about which you probably read in books. I immediately recall a joke about two fleas, one asks: I wonder if life is on other dogs. ) by your answer you will pass the interview. - garik
    • @garik on your so confident refutation of my words, you will not have the best database. I work in the firm of one of the largest in the Russian Federation and I am engaged in programming and processing information on an Oracle database. And I know what information is and how to store / process it not only by books. And in two entities and work experience, including the ERP system built entirely on the database. You can continue to refute as you like. Give advice to divide the plates if there are a lot of records on several ... you are tired of then putting them together when it will be necessary to process taxes or GP. - IvanZakirov

    Regardless of speed, you can't write like this:

     SELECT * … GROUP BY client_id 

    According to the SQL rules for grouping, it is permissible to specify in the SELECT phrase fields from the GROUP BY + phrase aggregate functions from other fields. MySQL with some settings permits liberties, but this (a) can break down when transferred to another server and (b) gives an unwarranted result.

    This task can be solved through a subquery, as it seems to me, with good performance:

     SELECT c.* FROM `clients` AS c INNER JOIN ( SELECT DISTINCT `client_id` FROM `orders` WHERE `date` >= '2015-01-01' ) AS co USING(`client_id`) 

    (Here, instead of SELECT DISTINCT ... in the subquery you can apply ... GROUP BY client_id, will give the same result.)

    https://dev.mysql.com/doc/refman/5.0/en/sql-mode.html#sqlmode_only_full_group_by

    Edited : I carefully re-read the question and realized that I need to add a date to the subquery. With the advent of the aggregate function from GROUP BY is no longer dodge :)

     SELECT co.`max_date`, c.* FROM `clients` AS c INNER JOIN ( SELECT `client_id`, MAX(`date`) AS `max_date` FROM `orders` WHERE `date` >= '2015-01-01' GROUP BY `client_id` ) AS co USING(`client_id`) 
    • Strange as it may seem, in another query, where I changed GROUP BY to DISTINCT, the speed dropped 2 - 3 times. I don’t know what this is about, but I decided not to risk it - ArchDemon
    • The case is master's. Check this case, and not another easy, right? Now you have different options, try, choose, analyze through explain. - artoodetoo
    • Of course the speed will fall. In the case of Group BY, the server on the indexed fields selected from the millions of records needed and brought out only them. If you replace GROUP BY with DISTINCT, then in this case the server will select all the millions of records, and then it will go over the selected ones and look for repeating all columns .. have you caught how many iterations over the data? - IvanZakirov
    • Not all what it seems! I'll tell you more: as a rule , group by leads to a complete brute force. Just try, try on, look at the request plan. Theory is nothing without practice. - artoodetoo

    And the client_id field client_id to which table? If this is clients.client_id , then it is better to use the client field from the orders table. Accordingly, the index for the orders.client_id , orders.date (most likely the best composite one) should be in the orders table.

    It is important that all sampling operations take place on the same table.

     SELECT clients.*, orders.date FROM clients LEFT JOIN orders USING(index_in_table_orders) WHERE orders.date >= '2015-01-01' GROUP BY orders.client_id ORDER BY orders.date DESC; 
    • client_id is in both tables. Otherwise, I could not combine them through USING() . The problem is not that I specify the field from the wrong table (mysql does this for itself), but that mysql first merges the tables, only then applies the filter. And this is a long time - ArchDemon
    • if this field were in both tables, there would be an error "column client_id is ambigous". Specify the request explicitly, GROUP BY orders.client_id. This needs to be done just so that mysql performs all the operations of selecting and grouping over the orders table, and then already made a JOIN. - Rjazhenka
    • There is no warning. With an explicit indication of the speed does not increase. - ArchDemon
    • @ user1778019, too, do not smack the crap. The server does not care what table you pulled the data into the SELECT block, if the tables are correctly connected. He so that so will prepare the data from both tables. Only displays those that are in the SELECT. Based on the words “with indexing everything is okay”, it is enough to alternately ahead of the criteria for selecting information from the tables, compile subqueries and from millions of rows will be pulled out in seconds or even less. Of course, if the iron allows besides;) - IvanZakirov
    • @ user1778019 oh yes, about the composite index in your answer is also complete nonsense. The index across the field that is already indexed and also combined with the date field .... God help everyone from those requests who is expecting results from them and those who rule such requests later. - IvanZakirov