Prompt on a sample of duplicate values. There is a table

( 1, 'user1', 'Новое сообщение', '2014-02-15' ), ( 2, 'user2', 'Новое сообщение', '2015-05-07' ), ( 3, 'user3', 'Новое сообщение', '2015-09-12' ), ( 4, 'user2', 'Регистрация', '2016-05-16' ), ( 5, 'user1', 'Новое сообщение', '2016-05-16' ), ( 6, 'user4', 'Создание темы', '2016-05-12' ) 

How to write a request to get how many users in each year wrote messages for the first time , i.e. there should be

 2014 1 2015 2 2016 0 

If you write this code

 SELECT Extract(YEAR from Date) As Year, Count(DISTINCT user) AS Count FROM table WHERE Type LIKE 'Новое%' GROUP BY Extract(YEAR from Date) 

then the output will be

 Year Count 2014 1 2015 2 2016 1 

And I, in fact, understand why this is happening. How to solve this problem?

    1 answer 1

    For example:

    First, select the minimum number of the year in which the user received a new message, and then from all of this make a grouping with summation by year:

     select Year, count(user) as Count from ( select min(extract(YEAR from Date)) as Year, user as User from table where Type like 'Новое%' group by user ) as FirstMessageYears group by Year 

    Uniqueness is ensured by the fact that we can get only one minimum date for each user, so distinct not needed.

    To add lines with years, for which there are messages, but not the first, you can use the fact that count (null) returns 0:

    The internal query is complicated by the fact that it now returns the year for which there is no user with the first message, with a null value in the user column.

     select AllYears.Year, count(FirstMessageYears.User) as Count from ( select min(extract(YEAR from Date)) as Year, user as User from table where Type like 'Новое%' group by user ) as FirstMessageYears right join ( select distinct extract(YEAR from Date) as Year from table ) as AllYears on FirstMessageYears.Year = AllYears.Year group by AllYears.Year 

    Let's try to figure it out. We start "to execute" SQL query "from within", i.e. with those parts that are at the maximum level of nesting. In our case, the nested one is a complex subquery:

      ( select min(extract(YEAR from Date)) as Year, user as User from table where Type like 'Новое%' group by user ) as FirstMessageYears right join ( select distinct extract(YEAR from Date) as Year from table ) as AllYears on FirstMessageYears.Year = AllYears.Year 

    To understand what this subquery returns, let's break it down into parts:

      ( select min(extract(YEAR from Date)) as Year, user as User from table where Type like 'Новое%' group by user ) as FirstMessageYears 

    right join

      ( select distinct extract(YEAR from Date) as Year from table ) as AllYears 

    Here, already a little easier. We have two queries, each returning a table, then these tables are joined by an operation with an incomprehensible name right join .

    First, let's look at what each of the parts is returned before merging.

    The first result of the subquery with the eloquent title FirstMessageYears will contain two columns: пользователь + дата первого сообщения .

    The second result of the subquery with the name AllYears will contain one column in which all the years occurring in the source table will be listed.

    What happens if you apply the right join operation to these two sets? Get a table consisting of two columns, the номер года and the пользователь . Logically, this set is filled in 2 stages:

    First, those lines from the AllYears set for which there is no match in the FirstMessageYears set fall into the result of the merge. The пользователь field for these lines remains null .

    Then, those lines from the FirstMessageYears set for which there is a match in the AllYears set fall into the result of the merge. The пользователь field for these lines is taken from the FirstMessageYears set.


    It remains to carry out a grouping according to this resulting witchcraft:

     select AllYears.Year, count(FirstMessageYears.User) as Count from [результат выполнения подзапроса] group by Year 

    and we remember that the FirstMessageYears.User column FirstMessageYears.User out to be null in those lines that correspond to years in which there was no first message from any of the users. Count (null) returns 0. Woo-a la.

    • I liked your approach, but the conclusion is obtained without 2016. How to make, what would also display the year, where 0 user'ov? - ka6an4eg
    • @ ka6an4eg Well, if it turns out, accept the answer - Zverev Evgeniy
    • accidentally enter pressed without adding) re-read the comment, edited - ka6an4eg
    • @ ka6an4eg try - Zverev Evgeniy
    • Super! Now it remains to figure out how it works :) Thank you! Can you recommend something to read, what to figure out? - ka6an4eg