I’ll say right away that I’m familiar with BigQuery superficially, but there is a task that needs to be solved by means of this system without resorting to using other tools. The bottom line is that BigQuery does not allow the use of cycles, and I cannot imagine how to solve this problem without them. Maybe someone has thoughts on this, and you can tell in which direction to dig?

The task itself sounds like this. There is a table with data, where for each date there is a set of user IDs - every day new users appear (for the first time), old ones disappear, and some remain (previously). It is necessary to calculate for each date, the number of users from the current date is in the next N, while they are not in the previous one. For example, the number of users from a cell for January 2, is for January 3-6, but they were not on January 1st.

The number of dates is finite, but the range is floating (maybe 5 dates, maybe 50, etc.) - that’s why the thoughts about the cycle arise, and I don’t imagine how to solve without it.

In other words, for clarity,

Given:

date id 01.01.2019 1 01.01.2019 2 01.01.2019 3 01.01.2019 4 01.01.2019 5 02.01.2019 2 02.01.2019 3 02.01.2019 4 03.01.2019 1 03.01.2019 2 03.01.2019 5 03.01.2019 6 04.01.2019 1 04.01.2019 2 04.01.2019 7 

At the output you need to get the following table:

 ---- 1день 2день 3день 4день 1янв 5 3 3 2 2янв 3 1 1 - 3янв 4 2 - - 4янв 3 - - - 

I am grateful for any help, I have already reread a lot of things, but I really couldn’t apply it.

  • What cycle? what for? 2 copies of the table, a synthetic table of numbers from 1 to the specified maximum, grouping and crosstab (). - Akina
  • I am grateful to you, and I am glad that this problem can be solved without resorting to cycles. I will read all the proposed methods, but unfortunately I still don’t understand how to group them and apply them. - LonelYNighTWolF
  • Do fiddle (or at least lay out the script for creating and filling the table). And specify the version of the SQL server. - Akina
  • As already wrote above, I unload data from BigQuery. The table is populated automatically when the user performs an action in the application. In this case, I try to count users who have entered the first and subsequent visits to the application. - LonelYNighTWolF

1 answer 1

 SELECT * FROM CROSSTAB('WITH RECURSIVE nums (num) AS ( SELECT 0 UNION ALL SELECT num+1 FROM nums WHERE num < (SELECT MAX(ddate) - MIN(ddate) FROM test) ) SELECT t1.ddate::DATE, nums.num::INT, COUNT(DISTINCT t2.id)::INT cnt_id FROM test t1, test t2, nums WHERE t1.ddate + nums.num = t2.ddate AND t1.id = t2.id GROUP BY t1.ddate, nums.num ORDER BY 1,2') AS ct("date" DATE, "1 day" INT, "2 days" INT, "3 days" INT, "4 days" INT); 

BigQuery does not support WITH RECURSIVE

Then the static table of numbers:

 SELECT * FROM CROSSTAB(' SELECT t1.ddate::DATE, nums.num::INT, COUNT(DISTINCT t2.id)::INT cnt_id FROM test t1, test t2, nums WHERE t1.ddate + nums.num = t2.ddate AND t1.id = t2.id AND nums.num <= (SELECT MAX(ddate) - MIN(ddate) FROM test) GROUP BY t1.ddate, nums.num ORDER BY 1,2') AS ct("date" date, "1 day" INT, "2 days" INT, "3 days" INT, "4 days" INT); 

Or dynamic:

 SELECT * FROM CROSSTAB(' SELECT t1.ddate::DATE, nums.num::INT, COUNT(DISTINCT t2.id)::INT cnt_id FROM test t1, test t2, (SELECT 0::INT num UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) nums WHERE t1.ddate + nums.num = t2.ddate AND t1.id = t2.id AND nums.num <= (SELECT MAX(ddate) - MIN(ddate) FROM test) GROUP BY t1.ddate, nums.num ORDER BY 1,2') AS ct("date" date, "1 day" INT, "2 days" INT, "3 days" INT, "4 days" INT); 

fiddle

  • In any case, I thank you for the example, I will try to adapt it for other tasks, but BigQuery does not support WITH RECURSIVE. Unfortunately ( - LonelYNighTWolF
  • @LonelYNighTWolF This is fixable. See fiddle. - Akina
  • Oh, it seems to me that it’s not possible for me to bring this not so difficult task to my mind, and it’s not convenient to spend someone else’s time, but BigQuery does not support CROSSTAB. I'm trying to google bypass. - LonelYNighTWolF
  • Damn, I forget everything that BigQuery - there was originally a post-regression tag, and he sat down in the head ... in general. how to get linear data, I showed (actually, the text of the subquery in CROSSTAB - this is it). And then emulate the pivot - see for example. stackoverflow.com/questions/26272514/… or there cloud.google.com/dataprep/docs/html/Pivot-Transform_57344645 - Akina