Please tell me: if there are several conditions in the query, for example, WHERE a=1 AND b=2 , then the data that satisfies the condition a=1 selected first, and then the data according to the condition b=2 are selected from them? Or for the second condition, the search goes again across the table? The first option is, of course, advisable, but something has not found explicit confirmation anywhere.
- oneThere are no guarantees of the condition check procedure, everything will be done as decided by the optimizer at stackoverflow.com/questions/484135/… - Ivan Ignatiev - MSFT
- @IvanIgnatiev, thanks for the info - Rltx11
- As an offtopic, I note that, for example, I read somewhere about sqlite3 that order matters - andreymal
3 answers
Depends on the index and the ingenuity of the optimizer. Well, we must understand that the developers of the base, as a rule, are quite experienced people.
You can check all this by running EXPLAIN your request.
If there are no indices for these fields
... then you have to make a decision : do two checks in one pass or two passes on one check. If the data is completely stored in the processor cache ( almost unreal case ), the difference will not be noticeable.
If they do not fit, then different sections of the checked data set from the main memory will be consistently loaded into the cache (if the data set is placed there). Loading into the cache takes some microtime, but it becomes larger, the larger the table. Making two passes is already more expensive: you need to make twice as many downloads to the cache as compared to the previous volume of comparisons.
If the data does not even fit in the RAM, everything is quite obvious - loading from the disk is very long . So much so that the rest of the query is unlikely to be noticeable, and performing two rounds is likely to approximately double the query execution time.
The result: one round, two checks for each row look more rational from all sides.
If there is an index on one of the fields
... then everything is obvious : it is immediately possible to "cheaply" (in terms of resources) reduce the enumerated data set to the part of the table with the specified value in the indexed field. And already on this set, perform a sequential search and check the second condition.
If there are indices on two fields
... then FIG EXPLAIN knows it! No one will give a more reliable answer, each database solves this question in its own way.
The optimizer can look at both indexes, estimate where the sequential search will be less and use it. And maybe make a mistake and take the wrong one, guided by some of their own considerations.
PostgreSQL can do a bitmap scan , during which it scans both indices and makes a bitmap for each condition, and then combines the two cards into one according to the conditions, getting the result. But whether the optimizer decides to do this.
Ideally: if there is an index for pairs of values
Here, the index is simply used directly, so the search order will coincide with the indexing order: first descend from the "upper levels of the index" (by the first expression), and then descend on the second one and immediately receive a response .
But all this does not matter if ...
- very little data
small amounts of data may be faster to get around without an index - the optimizer is screwing up (what happens )
debugging which turns into the hunt for indexes by readingEXPLAINs
I will add more about "anything happens." There was a situation with SQL Server, but it doesn’t even matter - I think a similar situation can be with other DBMS. Request of view (greatly simplified, in reality, of course, pages into two):
SELECT CONVERT(fielda, INT) FROM tablea WHERE ... The fielda here was text, but the WHERE clause guaranteed to cut off all invalid values (i.e., those that could not be converted to INT). And the request periodically fell on CONVERT. A manual check did not give anything - by sampling according to this condition, it was clear that valid values were being selected, and CONVERT was working. But here the matter is how the database server decides to execute the request. He can take a block of data and run CONVERT first , and then cut off the extra lines.
The database does not execute the query verbatim. Based on the query, an execution plan (query plan) is made, which will be affected, for example, by the presence of indices. It is possible to see what is in reality there through EXPLAIN before the request, but its result should also be able to read. The execution plan can be influenced in terms of joins and indexes (FORCE INDEX), but this is not complete control, in general, the database optimizer is designed to solve such issues itself.