I have a simple question, does the speed of the order ID inside the IN operator affect the speed? For example,

SELECT * FROM test WHERE id IN (221258, 121257, 977256, 2255, 52223, 50, 222222)

will it take longer than

SELECT * FROM test WHERE id IN (50, 2255, 52223, 121257, 221258, 222222, 977256) ?

Synthetic tests did not show a significant difference, but in fact how?

  • It seems to me that on the volumes specified by direct numbers in the query, most of the time will be spent on parsing the query. And if you specify a request, then with the necessary indices, he will figure out what order is needed. From the point of view of a blunt algorithm, in theory, it should influence, based on the fact that it compares by brute force. - Chad
  • @Chad, on the one hand, I understand that sorting should help, because it is more convenient to run in order than to and fro. But on the other hand, all of a sudden, SQL itself optimizes these moments, and once again I will strain PHP - frgs
  • @ Chad, how does MySQL know that your data is sorted? Only if he sorts them, or you tell him about it. I do not know any constant that would talk about it. Those. if MySQL itself does not sort, then it does not matter in what order the data is transferred. And if it sorts, then it also does not matter the order of the data. Those. anyway - the order is not important. - BOPOH
  • @BOPOH, and what did I write? :-) - Chad
  • one
    To get started, just look at the execution plan for the query (EXPLAIN SELECT ...) - avp

2 answers 2

The list of values ​​in the IN predicate is better to list from more frequently used to less. Then if the compared value is in the list, it will be selected faster. If the value is not in the list, the entire list will still be viewed.

A smart optimizer based on statistics (distribution of values) could change the order of the values, but I have not heard of that. Yes, and overhead costs need to be assessed.

UPD. Made a simple test. Since I mainly work with SQL Server, I’m on it.

Data generation:

 create table Tin (id int) declare @N int=1000000 declare @i int =1 set nocount on; while @i <100 begin insert into Tin values( @i ) set @i = @i +1 end while @i <=1000000 begin insert into Tin values(999) set @i = @i +1 end 

Queries:

 select * from tin where ID in(100,101,102,103,104,105,106,107,108,109,999) go select * from tin where ID in(999,100,101,102,103,104,105,106,107,108,109) 

Results (the second query shows consistently better results for elapsed time):

 SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (999901 row(s) affected) SQL Server Execution Times: CPU time = 1248 ms, elapsed time = 17998 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. (999901 row(s) affected) SQL Server Execution Times: CPU time = 1264 ms, elapsed time = 15905 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 57 ms. 

UPD2 Must give up his assumption, at least for SQL Server. The plan has a sorted list of values ​​in IN. Probably, the server sorts the values ​​before comparing. So with a large number of runs, the results should be almost the same.

  • And the meaning of this "faster", if you still have to look for all the values ​​in the list? From the fact that some value will be found faster, the entire list will not be moved faster anyway. With a sorted list it can be a little faster, because if we checked the first value and found it, then the second value from the list should be searched not from the beginning of the index tree, but already below the one found. But this is a penny, most likely. - BOPOH
  • Such an example. You are looking for users by city of residence. And you have 90% from Moscow. And the list is this: (New York, London, another thousand five hundred cities, Moscow) or such (Moscow, New York, London, another thousand five hundred cities) You can protest on a large sign. - msi
  • 2
    @msi, proofs can be "will still be viewed the entire list"? the indexed field will be searched by key, not full scan. Considering the id from the question, so generally the statement is strange. - Yura Ivanov
  • @Yura Ivanov, why browse the entire list if a match is found on the first item? - msi
  • Xs, as in mysql, but in Oracle IN (1, 2, 3) decomposed into 1 OR 2 OR 3 . Nothing depends on the order. - Indifferently

I did a dozen different tests, taking into account the errors, the results were almost the same. The best time showed something sorted, then no.

I tried to query 1000 times with the same sorted IN values ​​and not sorted. I tried the same thing, but added random IDs to IN. The base contained> 1 million records, the list inside IN was ~ 20 ID.

I did not notice the advantages of sorting during tests.

For myself, I concluded that you can not bathe over sorting.