Hello. There is a table with a large number of records. It has a datetime field with the TIMESTAMP type. Grouping by date is required. I had 3 options.

  1. GROUP BY DATE(datetime)
  2. GROUP BY UNIX\_TIMESTAMP(datetime) - UNIX\_TIMESTAMP(datetime)%(60\*60\*24) - subtract the remainder of the division by day, that is, round up to the day. The first option works ~ 2.1 seconds, the second ~ 0.95.

Then I realized that it was easier to work with the whole muscle type, and added the unix\_datetime with the INT type to the table, and wrote UNIX\_TIMESTAMP(datetime) to get rid of the transformations.
And I made the 3rd grouping: GROUP BY unix\_datetime - unix\_datetime%(60\*60\*24)
To my surprise, it works for ~ 1.25 seconds.
Indexes are on both datetime and unix_datetime.

Total we have results of performance:

  1. GROUP BY DATE(datetime) - ~ 2.1 sec
  2. GROUP BY UNIX\_TIMESTAMP(datetime) - UNIX\_TIMESTAMP(datetime)%(60\*60\*24) - ~ 0.95 sec.
  3. GROUP BY unix\_datetime - unix\_datetime%(60\*60\*24) - ~ 1.25 sec.

Can anyone explain why the second option is faster than the 3rd? Maybe someone will offer more options (grouping is required not only by date, but also by hours, weeks, months). Just test on grouping by day.

  • @ ray1992, If you are given a comprehensive answer, mark it as correct (click on the check mark next to the selected answer). @ ray1992, If you are given an exhaustive answer, mark it as correct (click on the check mark next to the selected answer). - Nicolas Chabanovsky

2 answers 2

It is difficult to say why such a difference between INT and TIMESTAMP in favor of the second. In theory, everything should be the other way around, because TIMESTAMP at a low level in MySQL is stored as an integer. Apparently, really something in the algorithms ...

But one thing is for sure: although you have indexes, they are not used in this query. This is because you are not using a pure value, but an expression for it. For the most efficient execution of this query, it is worth going to denormalize. Add a field with the DATE type, fill it in and index it so that the grouping is performed by the field without expressions.

    http://gpshumano.blogs.dri.pt/2009/07/06/mysql-datetime-vs-timestamp-vs-int-performance-and-benchmarking-with-myisam/

    The first time blunted. Not answered.

    Under the link there is a bench for three data types: INT, TIMESTAMP, DATETIME. There is one nuance. INT, of course, is quickly processed, but not in the case of more / less comparisons. Apparently, it all depends on the comparison algorithms DATETIME and TIMESTAMP.