Blog
Jul 16, 2022

How to Group Rows by Any Time Interval in Postgres?

Grouping by a time interval is a commonly needed SQL query. The time interval could be a day, week, or month. An example of this could be: finding the count of all the newly registered users for the past three months.

Usually, the suggested solution is to use a function to convert the timestamp column to the required time interval and do a GROUP BY on it. For eg, In Postgres both of these queries will return the count of registered users grouped by the day with the newer days first:

SELECT
  DATE_TRUNC('day',  created) as day,
  COUNT(*)
FROM requests
GROUP BY 1
ORDER BY 1 DESC;
SELECT
  TO_CHAR(created, 'YYYY-MM-DD') as day,
  COUNT(*)
FROM requests
GROUP BY 1
ORDER BY 1 DESC;

But if you need something more fine-grained? Let’s say you want to group rows in 15min intervals or six hours?

There is also another problem with converting existing timestamps. Suppose there was an hour when the site was down and no user could register on the site. Because no one could sign up, there won’t be any user whose created value would be from that hour. In the rows of text, you’re likely to miss the gap between time intervals.

How can we solve this?

Generate the Time Series then Join

Postgres provides a handy function generate_series to help us here. Instead of converting timestamps, we generate a list of timestamps and then JOIN rows on those timestamps.

For eg, we can generate a list of timestamps starting from the past week in the interval of 15min using this:

SELECT GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m' ) as ts;

Now, to make our join easier, we should generate two columns that can represent the start and end of the interval like:

start end
2022-07-09 00:00:00+00 2022-07-09 00:15:00+00
2022-07-09 00:15:00+00 2022-07-09 00:30:00+00
2022-07-09 00:30:00+00 2022-07-09 00:45:00+00

To do this, we can use WINDOW functions.

WITH timestamps as (
  SELECT GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m' ) as ts
)
SELECT
  ts as start,
  LEAD(ts) as end
OVER ()
FROM timestamps;

Now for the final JOIN, we need to check if the created lies between start and end, which we can do by

WITH timestamps AS (
  SELECT
    GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m') AS ts
), range_ts AS (
  SELECT
    ts AS start_ts,
    LEAD(ts) OVER () AS end_ts
  FROM
    timestamps
)
SELECT
  COUNT(requests.created),
  start_ts,
  end_ts
FROM
  range_ts
  LEFT JOIN requests ON requests.created BETWEEN start_ts AND end_ts
GROUP BY 2, 3;

But there’s a problem here. This query would be too slow. For the DB engine, it’s very hard to optimize non-equality-based JOIN clause like the BETWEEN clause we have. It would need to run “rows in timestamps X rows in requests” operations to perform the join.

Fortunately, we can optimize this by introducing an equality-based clause too. The only rows we are interested in are the rows that are from the same hour, so we can add a condition:

DATE_TRUNC('hour', start_ts) = DATE_TRUNC('hour', created)

Our final query becomes,

WITH timestamps AS (
  SELECT
    GENERATE_SERIES(NOW() - INTERVAL '7d', NOW(), INTERVAL '15m') AS ts
), range_ts AS (
  SELECT
    ts AS start_ts,
    LEAD(ts) OVER () AS end_ts
  FROM
    timestamps
)
SELECT
  COUNT(requests.created),
  start_ts,
  end_ts
FROM
  range_ts
  LEFT JOIN requests ON requests.created BETWEEN start_ts AND end_ts
  AND DATE_TRUNC('hour', start_ts) = DATE_TRUNC('hour', created)
GROUP BY 2, 3;

Output

count start_ts end_ts
0 2022-07-09 16:50:54.39318+00 2022-07-09 17:05:54.39318+00
2 2022-07-09 17:05:54.39318+00 2022-07-09 17:20:54.39318+00
1 2022-07-09 17:20:54.39318+00 2022-07-09 17:35:54.39318+00
1 2022-07-09 17:35:54.39318+00 2022-07-09 17:50:54.39318+00
0 2022-07-09 17:50:54.39318+00 2022-07-09 18:05:54.39318+00
0 2022-07-09 18:05:54.39318+00 2022-07-09 18:20:54.39318+00
4 2022-07-09 18:20:54.39318+00 2022-07-09 18:35:54.39318+00
7 2022-07-09 18:35:54.39318+00 2022-07-09 18:50:54.39318+00
2 2022-07-09 18:50:54.39318+00 2022-07-09 19:05:54.39318+00
8 2022-07-09 19:05:54.39318+00 2022-07-09 19:20:54.39318+00
18 2022-07-09 19:20:54.39318+00 2022-07-09 19:35:54.39318+00
0 2022-07-09 19:35:54.39318+00 2022-07-09 19:50:54.39318+00
13 2022-07-09 19:50:54.39318+00 2022-07-09 20:05:54.39318+00
0 2022-07-09 20:05:54.39318+00 2022-07-09 20:20:54.39318+00