How to Group Rows by Any Time Interval in Postgres?
Grouping by a time interval is a commonly needed SQL query. The time interval could be a day, week, or month. An example of this could be: finding the count of all the newly registered users for the past three months.
Usually, the suggested solution is to use a function to convert the timestamp column to the required time interval and do a
GROUP BY on it. For eg, In Postgres both of these queries will return the count of registered users grouped by the day with the newer days first:
SELECT DATE_TRUNC('day', created) as day, COUNT(*) FROM requests GROUP BY 1 ORDER BY 1 DESC;
SELECT TO_CHAR(created, 'YYYY-MM-DD') as day, COUNT(*) FROM requests GROUP BY 1 ORDER BY 1 DESC;
But if you need something more fine-grained? Let’s say you want to group rows in 15min intervals or six hours?
There is also another problem with converting existing timestamps. Suppose there was an hour when the site was down and no user could register on the site. Because no one could sign up, there won’t be any user whose
created value would be from that hour. In the rows of text, you’re likely to miss the gap between time intervals.
How can we solve this?
Generate the Time Series then Join
Postgres provides a handy function
generate_series to help us here. Instead of converting timestamps, we generate a list of timestamps and then JOIN rows on those timestamps.
For eg, we can generate a list of timestamps starting from the past week in the interval of 15min using this:
SELECT GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m' ) as ts;
Now, to make our join easier, we should generate two columns that can represent the start and end of the interval like:
|2022-07-09 00:00:00+00||2022-07-09 00:15:00+00|
|2022-07-09 00:15:00+00||2022-07-09 00:30:00+00|
|2022-07-09 00:30:00+00||2022-07-09 00:45:00+00|
To do this, we can use WINDOW functions.
WITH timestamps as ( SELECT GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m' ) as ts ) SELECT ts as start, LEAD(ts) as end OVER () FROM timestamps;
Now for the final JOIN, we need to check if the
created lies between
end, which we can do by
WITH timestamps AS ( SELECT GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m') AS ts ), range_ts AS ( SELECT ts AS start_ts, LEAD(ts) OVER () AS end_ts FROM timestamps ) SELECT COUNT(requests.created), start_ts, end_ts FROM range_ts LEFT JOIN requests ON requests.created BETWEEN start_ts AND end_ts GROUP BY 2, 3;
But there’s a problem here. This query would be too slow. For the DB engine, it’s very hard to optimize non-equality-based JOIN clause like the BETWEEN clause we have. It would need to run “rows in timestamps X rows in requests” operations to perform the join.
Fortunately, we can optimize this by introducing an equality-based clause too. The only rows we are interested in are the rows that are from the same hour, so we can add a condition:
DATE_TRUNC('hour', start_ts) = DATE_TRUNC('hour', created)
Our final query becomes,
WITH timestamps AS ( SELECT GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m') AS ts ), range_ts AS ( SELECT ts AS start_ts, LEAD(ts) OVER () AS end_ts FROM timestamps ) SELECT COUNT(requests.created), start_ts, end_ts FROM range_ts LEFT JOIN requests ON requests.created BETWEEN start_ts AND end_ts AND DATE_TRUNC('hour', start_ts) = DATE_TRUNC('hour', created) GROUP BY 2, 3;
|0||2022-07-09 16:50:54.39318+00||2022-07-09 17:05:54.39318+00|
|2||2022-07-09 17:05:54.39318+00||2022-07-09 17:20:54.39318+00|
|1||2022-07-09 17:20:54.39318+00||2022-07-09 17:35:54.39318+00|
|1||2022-07-09 17:35:54.39318+00||2022-07-09 17:50:54.39318+00|
|0||2022-07-09 17:50:54.39318+00||2022-07-09 18:05:54.39318+00|
|0||2022-07-09 18:05:54.39318+00||2022-07-09 18:20:54.39318+00|
|4||2022-07-09 18:20:54.39318+00||2022-07-09 18:35:54.39318+00|
|7||2022-07-09 18:35:54.39318+00||2022-07-09 18:50:54.39318+00|
|2||2022-07-09 18:50:54.39318+00||2022-07-09 19:05:54.39318+00|
|8||2022-07-09 19:05:54.39318+00||2022-07-09 19:20:54.39318+00|
|18||2022-07-09 19:20:54.39318+00||2022-07-09 19:35:54.39318+00|
|0||2022-07-09 19:35:54.39318+00||2022-07-09 19:50:54.39318+00|
|13||2022-07-09 19:50:54.39318+00||2022-07-09 20:05:54.39318+00|
|0||2022-07-09 20:05:54.39318+00||2022-07-09 20:20:54.39318+00|