There is a field in the table:

pp enum('T','S','E','U','AC','AT','M','MA','BA','MC','BC','D') COLLATE utf8_unicode_ci DEFAULT NULL KEY pp (pp) 

If you do this:

 select * from tbl_invoices where (pp is null or pp in ('S')); 

This shows that the key is used in the request:

 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl_invoices type: ref_or_null possible_keys: pp key: pp key_len: 2 ref: const rows: 83064 Extra: Using index condition; Using where 

And if you do this:

 select * from tbl_invoices where (pp is null or pp in ('S','D')); 

That key is no longer used:

 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl_invoices type: ALL possible_keys: pp key: NULL key_len: NULL ref: NULL rows: 321844 Extra: Using where 

Tell me how to remake the request, so that in the second case, the key is also used?

  • And show plans for each part of where . Those. I am interested in how many records with pp IS NULL , pp = 'S' and pp = 'D' . In general, this is normal behavior - sampling by index leads to random reading from the disk, and if almost all the records meet the condition, then the records will be scanned sequentially much faster. - BOPOH
  • Yeah, I get it. Does it make sense sometimes to insist on your force index (pp) or can we assume that the mysql optimizer always knows better than me? - Fyodor Ustinov
  • The optimizer does not always correctly optimize, but more often it shows quite not bad results (and more often it sees the whole picture better than you, because there is not only random reading), so I would not change its behavior. But the real base and requests are at your fingertips, so you can see how the plan behaves if you give it different hints (like the same force index ). If MyISAM is used, then it may make sense to make ANALYZE TABLE to recalculate the index cardinality (for the like, InnoDB does not seem to help) - BOPOH
  • @BOPOH Please post your comments as an answer. - Nicolas Chabanovsky
  • @NicolasChabanovsky, this is only an assumption, because I did not see the total number of entries for each key value - BOPOH

1 answer 1

You have two different cases, in the first case, the optimizer accurately recognizes the situation. pp = 'S' - comparing the value with a constant about which the EXPLAIN report reports and reports

 type: ref_or_null ref: const 

This means that these tables will not be used for searching at all - the result will be found in the index and the data from the table will be returned.

In the second case, the situation is more complicated, since the optimizer will not succeed in getting rid of the IN . I received other results from EXPLAIN , the DBMS reported that it would use the index in the covering mode

 type: range ref: NULL 

Those. the index will be used, not for searching, but for sorting data. That we get different results is normal. The fact is that the optimizer makes a decision depending on the size of the tables and query statistics. Why query optimization and recommend engaging in a combat, heated server, since statistics on a cold server with the same database may be different.

Now to your question. Note that in the second query, you have a rather large number of retrievable rows: 321844 , which is probably comparable to the number of records in the tbl_invoices table in this case, the optimizer may decide that there is no point in sorting the table and the full scan of the unsorted table will be faster , than preliminary sorting (and not the fact that in random access memory, can on a disk) with the subsequent incomplete scan of the sorted data.