I can not increase the speed of the request

Question

There are two tables: Customers, Orders_Customers. I want to make a request, which of the customers did the last order this year

SELECT clients.*, orders.date FROM clients LEFT JOIN orders USING(client_id) WHERE orders.date >= '2015-01-01' GROUP BY client_id ORDER BY orders.date DESC;

Indexes have tables. Everything would be fine, but on 100K clients and on 1M orders, the speed of fulfilling a request leaves much to be desired. I haven’t found anything better than entering the customer with the last order date, but can I optimize the query?

SELECT * FROM clients WHERE client_id IN (SELECT client_id FROM orders WHERE date >= '2014-01-01' GROUP BY client_id);
performed much faster, but does not show the last order date.
Here to you for the future citforum.ru/database/digest/dig_2412.shtml Believe me as a person working with the base of one of the largest companies in the Russian Federation.

ArchDemon ArchDemon 2,268 2 gold marks 6 silver marks 23 bronze marks · Accepted Answer · 2015-04-05T09:13:15

 SELECT o.m_d, c.* FROM (SELECT client_id, MAX(date) AS m_d FROM orders WHERE date >= '2014-01-01' GROUP BY client_id) AS o INNER JOIN clients AS c USING(client_id);

So it happened very quickly and what you need. Thank you @IvanZakirov

IvanZakirov IvanZakirov 388 1 silver mark 12 bronze marks · Answer 2 · 2015-04-05T07:10:50

JOIN construction increases the processing time of a request with a large number of records (well, it depends on the conditions).

 SELECT clients.*, max(orders.date) -- вывести максимальную дату, она же последняя (желательно еще там время учитвать, вдруг в 1 день несколько клиентов брали заказы) FROM clients c, orders o WHERE c.id = o.client_id GROUP BY client_id ORDER BY orders.date DESC

or another option for the maximum order ID - it will also be the last one:

 SELECT clients.name, max(orders.id) -- вывести максимальную дату, она же последняя (желательно еще там время учитвать, вдруг в 1 день несколько клиентов брали заказы) FROM clients c, orders o WHERE c.id = o.client_id GROUP BY clients.name ORDER BY orders.date DESC

Yes, actually my and your request give approximately the same time 9 - 11 seconds.
SELECT client.name FROM client c, (SELECT max (orders.id), orders.client_id) o WHERE c.id = o.client_id
See ONLY_FULL_GROUP_BY The second option groups by client_name, but does it say somewhere that the field uniquely identifies the client?
And all your advice, some kind of nonsense (in the past is nonsense :)) and especially the expression ".... the field uniquely identifies the client ...".
A uniquely identifying field is a field used by the primary key, a unique non-repetitive value.
and where does the grouping and the unique identifying field?
And he wrote only a recommendation based not on the first post of the question.
... knowing the base query would be quite possible different and differently built.
I repeat once again - I did not write a ready-made solution, but a recommendation to the approach of obtaining information from the database.
Here are two tables having a primary key in one and a connection to another on the secondary.
And the guy is right with the orientation and wrote how he quickly began to work at the same time using the JOIN, even though I said no need.

Answer 3 · 2015-04-07T10:36:08

Add in the client a link to his last order - see the difference. It really makes the server easier and reduces the response time.

Added by:

Solution (if the requests are not satisfied at the moment with the server that exists): transfer the old "unnecessary" (to this important request) data to the archive table, all the time work with the operational table (current year). For all other data consumers (who need everything) - make a View (view) and “hide” these old and operational data - that is, make a common interface that will hide the physical database structure, but will not affect the client’s functionality (the client will communicate with a view, but the one who wants to get the current data from a small tablet will get it directly).

But I don’t like to enter additional fields, if the information can be extracted and so.
@ArchDemon Well then there is nothing to store orders for all the years in one tablet.
The database was created to store information and no one divides it by nameplates if there are many records.
To create additional fields for the sake of the information that can be pulled out anyway - denormalization is unjustified.
Normal server pull the desired row from the table with millions of records delov for 0.005 seconds.
here we are talking about large tablets, and not presentational beautiful database structures about which you probably read in books.
I immediately recall a joke about two fleas, one asks: I wonder if life is on other dogs.
@garik on your so confident refutation of my words, you will not have the best database.
I work in the firm of one of the largest in the Russian Federation and I am engaged in programming and processing information on an Oracle database.
And I know what information is and how to store / process it not only by books.
And in two entities and work experience, including the ERP system built entirely on the database.
Give advice to divide the plates if there are a lot of records on several ... you are tired of then putting them together when it will be necessary to process taxes or GP.

Answer 4 · 2015-04-07T10:22:30

Regardless of speed, you can't write like this:

 SELECT * … GROUP BY client_id

According to the SQL rules for grouping, it is permissible to specify in the SELECT phrase fields from the GROUP BY + phrase aggregate functions from other fields. MySQL with some settings permits liberties, but this (a) can break down when transferred to another server and (b) gives an unwarranted result.

This task can be solved through a subquery, as it seems to me, with good performance:

 SELECT c.* FROM `clients` AS c INNER JOIN ( SELECT DISTINCT `client_id` FROM `orders` WHERE `date` >= '2015-01-01' ) AS co USING(`client_id`)

(Here, instead of SELECT DISTINCT ... in the subquery you can apply ... GROUP BY client_id, will give the same result.)

https://dev.mysql.com/doc/refman/5.0/en/sql-mode.html#sqlmode_only_full_group_by

Edited : I carefully re-read the question and realized that I need to add a date to the subquery. With the advent of the aggregate function from GROUP BY is no longer dodge :)

 SELECT co.`max_date`, c.* FROM `clients` AS c INNER JOIN ( SELECT `client_id`, MAX(`date`) AS `max_date` FROM `orders` WHERE `date` >= '2015-01-01' GROUP BY `client_id` ) AS co USING(`client_id`)

Strange as it may seem, in another query, where I changed GROUP BY to DISTINCT, the speed dropped 2 - 3 times. I don’t know what this is about, but I decided not to risk it
Now you have different options, try, choose, analyze through explain.
In the case of Group BY, the server on the indexed fields selected from the millions of records needed and brought out only them.
If you replace GROUP BY with DISTINCT, then in this case the server will select all the millions of records, and then it will go over the selected ones and look for repeating all columns .. have you caught how many iterations over the data?
I'll tell you more: as a rule , group by leads to a complete brute force.

Legionary 1.407 5 silver marks 16 bronze marks · Answer 5 · 2015-04-07T12:36:27

And the client_id field client_id to which table? If this is clients.client_id , then it is better to use the client field from the orders table. Accordingly, the index for the orders.client_id , orders.date (most likely the best composite one) should be in the orders table.

It is important that all sampling operations take place on the same table.

 SELECT clients.*, orders.date FROM clients LEFT JOIN orders USING(index_in_table_orders) WHERE orders.date >= '2015-01-01' GROUP BY orders.client_id ORDER BY orders.date DESC;

Legionary

1.407 5 silver marks 16 bronze marks

Rjazhenka

1,054 7 silver marks 22 bronze marks

client_id is in both tables. Otherwise, I could not combine them through USING() . The problem is not that I specify the field from the wrong table (mysql does this for itself), but that mysql first merges the tables, only then applies the filter. And this is a long time - ArchDemon
if this field were in both tables, there would be an error "column client_id is ambigous". Specify the request explicitly, GROUP BY orders.client_id. This needs to be done just so that mysql performs all the operations of selecting and grouping over the orders table, and then already made a JOIN. - Rjazhenka
There is no warning. With an explicit indication of the speed does not increase. - ArchDemon
@ user1778019, too, do not smack the crap. The server does not care what table you pulled the data into the SELECT block, if the tables are correctly connected. He so that so will prepare the data from both tables. Only displays those that are in the SELECT. Based on the words “with indexing everything is okay”, it is enough to alternately ahead of the criteria for selecting information from the tables, compile subqueries and from millions of rows will be pulled out in seconds or even less. Of course, if the iron allows besides;) - IvanZakirov
@ user1778019 oh yes, about the composite index in your answer is also complete nonsense. The index across the field that is already indexed and also combined with the date field .... God help everyone from those requests who is expecting results from them and those who rule such requests later. - IvanZakirov

|

I can not increase the speed of the request

5 answers 5

More articles: