A simple data table is created:

CREATE TABLE IF NOT EXISTS `table_1` ( id INT NOT NULL AUTO_INCREMENT, user_id INT( 33 ), user_name VARCHAR( 255 ), PRIMARY KEY ( `id` ) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 

As you can see from this example, unique keys are created in the id column (due to AUTO_INCREMENT ). The user_id field also contains the unique values ​​of user identifiers generated by the script (using mt_rand) during registration in the personal account.

For sampling I make the following request:

 SELECT `user_name` FROM `table_1` WHERE `user_id`=28572 

I thought about using indexes in my data table to speed up sampling. The documentation says:

The presence of an index can significantly increase the speed of execution of some queries and reduce the time to search for the necessary data due to their physical or logical ordering.

Questions:

  1. Do I need to create indexes for the user_id field to speed up the sampling of millions of records, if all the identifier values ​​in this field are user_id unique?

  2. If you still need to, then which index should be created: cluster or non-cluster (I do not quite understand the differences between them)?

  3. The WHERE condition (in my example above) makes the DBMS iterate over all the records in the table? Or does the DBMS immediately refer specifically to only those records that satisfy the search user_id = 28572 ( user_id = 28572 ), without affecting the rest of the records?

  • one
    Why do you need an id if you already have another unique key? - Petr Abdulin
  • For ORDER BY id DESC. - StasHappy
  • And often you need this sorting (on all lines)? - Petr Abdulin
  • Yes. Quite often when outputting any data. SELECT name_user FROM table_1 ORDER BY id DESC LIMIT 0, 10 - StasHappy
  • You do not need two unique fields and, accordingly, two indices. Combine user_id and id . - Yura Ivanov

4 answers 4

Do I need to create indexes for the user_id field to speed up the sampling of millions of records, if all the identifier values ​​in this field are already unique?

Yes, you need to - because you are often going to make a selection from a table based on the contents of this field. Once the architecture determines the uniqueness of the data falling into this field, it would be a good idea to use the UNIQUE index. If a unique index is assigned to the field, the database will not insert a record with its duplicate value. Such a move towards normalization.

If you still need to, then which index should be created: cluster or non-cluster (I do not quite understand the differences between them)?

The cluster index is not needed until you are clearly aware of its necessity. Surprise: in InnoDB, the primary key is always clustered. And you already have it. Do not think about it yet.

The WHERE clause causes the DBMS to iterate over all the records in the table?

Yes. You can learn this (and much more) by running a query with the EXPLAIN keyword in front of it ( EXPLAIN SELECT * FROM ... WHERE ... ORDER BY ... LIMIT ... ). If a condition is sampled, and the condition includes a field that is not covered by a suitable index, MySQL will most likely perform a full-table scan. This is an expensive I / O operation, so correct placement of indices is the essence of half the success of database optimization for fast work.

    I will add to the answer @Mirdin

    1. The class index physically orders the table by index. The fastest (for search). Obviously, he can only be one. PK is always a clustered index by default.

    2. You can reduce the execution time by 50% (on average) if you add LIMIT 0,1 - because The base does not know about the uniqueness of your column. Without a limit, it will go through all the values ​​of the table, even if it already finds a match.

    But rightly so, make this column:

     ALTER TABLE `table_1` ADD UNIQUE INDEX `user_id` 

    Regarding your comment. This is a rather complicated question and it depends a lot on what requests you make. If you have a composite index on the columns (А, Б, В) then the index will be useful when you search ( WHERE ) for one or several columns on the left, i.e. А , (А и Б) , (А и Б и В) . And if you are looking only for Б - will not work. On the other hand, if you have separate indexes for, for example, A and Б , then when both fields are enabled, it is likely that only 1 index will work, and the second will be a simple search (although I’m not sure here). It is necessary to understand that the greatest damage falls on the first filtering - i.e. when a table is scanned for a million records, for example. If 99% was excluded by one index, then a search by 10,000 entries, even without an index, is no longer so deadly.

    In general, as you probably understand, the answer is no. Act sequentially, add one index and look at the result. It is possible that one index will be enough.

    • Thank you for your comprehensive answers. Now I see the difference between clustered and non-clustered indexes. I still have a final question regarding combined indexes. Namely .. Depending on the task, the sample from the database can be made on one or several fields. Will it be correct to create separate indexes for each of the fields and one (combined) index for all fields? - StasHappy
    • one
      Depends on requests. EXPLAIN helps you find the answer. - AntonioK
    • one
      @stashappy I added to the answer. - Petr Abdulin
    • Thank you friends. I will try various options. I will look that will return EXPLAIN, I will compare results. - StasHappy
    1. Yes, if you often filter by this field.
    2. Not clustered, you already have a PK.
    3. The server does not own magic, so if there is no index, it will go through all the records in the table, if there is, then it will go through the structure that the index implements, but this also doesn’t "immediately address specifically those records" ...

    PS Wrote in general, there may be some differences in MySQL.

      My five kopecks, albeit a year later, but for future generations: - if you use the admin panel to edit / delete the users table, then the id field is desirable, although you can do without it. In this case, do this:

        CREATE TABLE IF NOT EXISTS `table_1` ( user_id INT( 33 ) NOT NULL, user_name VARCHAR( 255 ), PRIMARY KEY ( `user_id` ) ) ENGINE=InnoDB 

      If, nevertheless, the id (PK) field is fundamentally necessary, then in your case it is better to use HASH indices:

       CREATE TABLE IF NOT EXISTS `table_1` ( id INT NOT NULL AUTO_INCREMENT, user_id INT( 33 ), user_name VARCHAR( 255 ), PRIMARY KEY ( `id` ), UNIQUE INDEX idx_user (user_id) USING HASH ) ENGINE=InnoDB 

      The access time on the HASH index will be O (1), in contrast to O (lg2 ​​(N)) for BTREE, on large volumes - this is noticeable in principle.