What Are Windows?

Windows are a very different concept from everything we've done so far with SQL. Windowing gives each row visibility to the values of different rows in the same table.

Suppose you are a software engineer working at the International Olympic Committee. You are tasked with analyzing data from the shot put event in the Olympic Games. Each shot putter's throw is recorded in the shot_puts table.


shot_puts
column_name type
id
INTEGER
athlete_id
INTEGER
athlete_sex
VARCHAR
distance_m
NUMERIC
technique
VARCHAR

Your first task is simple: rank the throws in the shot_puts table. Something like this…

rank
BIGINT
id
INTEGER
athlete_id
INTEGER
athlete_sex
VARCHAR
distance_m
NUMERIC
technique
VARCHAR
1
93514
7186
male
21.268
glide
2
20546
5519
male
20.208
spin
3
73198
1791
male
19.876
glide
4
92196
5933
female
19.273
glide
5
58542
4958
male
18.491
spin

…adding a rank column to show the ranking of each throw among all the other competitors. How would we do that? When computing the rank for each row, we need some way of comparing it to other rows in the same table. However, we don't have a way to do that yet. Whenever we compute columns in the SELECT clause, we're basically just "mapping" each row in the source table(s) into each row of the output table, (unless aggregations are involved, and those won't help us here.)

That is where window functions come in.

Window Functions

Window functions, a.k.a "analytic functions", are functions you can call in the SELECT clause, where you specify a window. The window is another input to the function, in addition to the normal arguments between the parentheses. Unlike the normal arguments, windows can "see" or access the other rows in the same table. Calling window functions looks like function_name() OVER (…), a function call followed by an OVER clause which defines the window. The function_name must be 1 of a specific set of functions that can work as window functions.

List of all window functions.

You can Postgres' full list of window functions here.

The first one we'll learn is rank(). It does what you'd intuit, generates a ranking number based on the comparison with the other rows. The window definition in the OVER clause tells it which rows to rank, and how to rank them.

SELECT
  rank() OVER (ORDER BY distance_m DESC),
  *
FROM shot_puts
ORDER BY 1;
rank
BIGINT
id
INTEGER
athlete_id
INTEGER
athlete_sex
VARCHAR
distance_m
NUMERIC
technique
VARCHAR
1
93514
7186
male
21.268
glide
2
20546
5519
male
20.208
spin
3
73198
1791
male
19.876
glide
4
92196
5933
female
19.273
glide
5
58542
4958
male
18.491
spin
6
38980
8852
female
18.485
glide
7
92127
3749
female
18.316
glide
8
84190
1121
female
17.721
glide
9
50726
8639
female
17.553
glide
10
85215
7900
male
16.229
glide

The most basic window is simply OVER (), which means the window function "sees" the whole table in an unspecified order. However, we are using rank() here, and calling rank() OVER () will produce a ranking in a random order, which is meaningless. Therefore, we specify the window ordering by OVER (ORDER BY distance_m DESC), so that an order is defined. This gives our ranking the meaning we want it to have: a ranking of these throws from most to least distance.

As we can see, window functions run once per row and generate 1 value per row. This is how every shot put in the table has its own rank. They can take inputs from the current row and the window for that row. Note that the value of the window can be different for each row, though it is the same for all rows in this example: a view of the entire shot_puts table ordered by distance.

We'll look at some window functions which do not depend on ordering later. For now, let's look at another challenge with ordered windows.

Fetching Values From Other Rows

Our next engineering task is to produce a table showing the difference in distance between each athlete and the 1 a rank behind them. It's simple to find a difference in distances by subtraction, but what are we subtracting? Each athlete's throw distance is in the distance_m column. We also need to get the throw distance of the athlete behind them and subtract it. This means each row needs to refer to the previous row in the ordering, e.g. the 2nd row can access the 1st. This is where window functions come in again.

We saw that the rank() function lets us generate a single number based on the window order, but doesn't let us refer to any specific row in the window. For this, we can use lag() instead.

SELECT
  *,
  distance_m - lag(distance_m) OVER (ORDER BY distance_m) AS distance_diff_m
FROM shot_puts
ORDER BY distance_m;
id
INTEGER
athlete_id
INTEGER
athlete_sex
VARCHAR
distance_m
NUMERIC
technique
VARCHAR
distance_diff_m
NUMERIC
85215
7900
male
16.229
glide
NULL
50726
8639
female
17.553
glide
1.324
84190
1121
female
17.721
glide
0.168
92127
3749
female
18.316
glide
0.595
38980
8852
female
18.485
glide
0.169
58542
4958
male
18.491
spin
0.006
92196
5933
female
19.273
glide
0.782
73198
1791
male
19.876
glide
0.603
20546
5519
male
20.208
spin
0.332
93514
7186
male
21.268
glide
1.060

When computing the output for each row, the value inside the lag() function is computed for the previous row. Like rank(), lag() is sensitive to its window ordering, so ORDER BY is used.

The ORDER BY inside an OVER clause applies only to that specific window, not to the result table as a whole. For ordering the output table, a separate ORDER BY clause is still needed.

The complement to lag() is lead(), which looks at the next row. We could've used that here to get an equivalent result if we reverse the window order.

SELECT
  *,
  distance_m - lead(distance_m) OVER (ORDER BY distance_m DESC) AS distance_diff_m
FROM shot_puts
ORDER BY distance_m DESC;
id
INTEGER
athlete_id
INTEGER
athlete_sex
VARCHAR
distance_m
NUMERIC
technique
VARCHAR
distance_diff_m
NUMERIC
93514
7186
male
21.268
glide
1.060
20546
5519
male
20.208
spin
0.332
73198
1791
male
19.876
glide
0.603
92196
5933
female
19.273
glide
0.782
58542
4958
male
18.491
spin
0.006
38980
8852
female
18.485
glide
0.169
92127
3749
female
18.316
glide
0.595
84190
1121
female
17.721
glide
0.168
50726
8639
female
17.553
glide
1.324
85215
7900
male
16.229
glide
NULL

If you want to lag or lead by 2, 3, 4, n rows, etc. you can pass a second argument to do that, e.g. lag(distance_m, 2).

There are other window functions for referencing specific rows within the window, like first_value(), last_value(), and nth_value().

© 2022 Andrew Carlson. All rights reserved.