Description: A very lucid description of window functions may be found in [2]:
After the database server has completed all of the steps necessary to evaluate a query, including joining, filtering, grouping, and sorting, the result set is complete and ready to be returned to the caller. Imagine if you could pause the query execution at this point and take a walk through the result set while it is still held in memory; what types of analysis might you want to do? If your result set contains sales data, perhaps you might want to generate rankings for salespeople or regions, or calculate percentage differences between one time period and another. If you are generating results for a financial report, perhaps you would like to calculate subtotals for each report section, and a grand total for the final section. Using analytic functions, you can do all of these things and more.
We may find a somewhat more dry description in [3]:
Once you understand the concept of grouping and using aggregates in SQL, understanding window functions is easy. Window functions, like aggregate functions, perform an aggregation on a defined set (a group) of rows, but rather than returning one value per group, window functions can return multiple values for each group. The group of rows to perform the aggregation on is the window.
Window functions certainly sound cool and promising. But given the fuss at the beginning of this study guide about query execution order, it makes sense we might ask the same question of window functions: When are window functions executed or their actions performed? We may find our answer in [4]:
It is important to note that window functions are performed as the last step in SQL processing prior to the
ORDER BYclause.
This will now make sense if we look back at the mental model previously discussed for how to think about SELECT queries (relevant portion now in bold):
FROM/JOIN (and all associated ON conditions)WHEREGROUP BYHAVINGSELECT (including window functions)DISTINCTORDER BYLIMIT/OFFSETExample: Consider what is currently the top voted answer to problem 178. Rank Scores by StefanPochmann (he actually provides several different answers):
-- Answer 1 (uses correlated subquery)
SELECT
Score,
(SELECT COUNT(DISTINCT Score) FROM Scores WHERE Score >= S.Score) 'Rank'
FROM Scores S
ORDER BY Score DESC
-- Answer 2 (nested subquery)
SELECT
Score,
(SELECT COUNT(*) FROM (SELECT DISTINCT Score S FROM Scores) tmp WHERE S >= Score) 'Rank'
FROM Scores
ORDER BY Score DESC
-- Answer 3 (non-equi join)
SELECT S.Score, COUNT(DISTINCT t.score) 'Rank'
FROM Scores S JOIN Scores T ON S.Score <= T.score
GROUP BY S.Id
ORDER BY S.Score DESCThese are all great, perfectly legitimate answers. Each one has its own degree of sophistication. But I think nearly everyone would agree on thing: each answer is rather complex. Is there an easier way to craft an answer for this problem? Window functions to the rescue!
The DENSE_RANK() window function makes quick work of this problem:
SELECT
S.Score,
DENSE_RANK() OVER (ORDER BY S.Score DESC) AS 'Rank'
FROM
Scores S;Of course, not every problem will be so neatly solved by simply using a window function, but this example should give you an idea of just how powerful window functions can be.
Window Function Table Reference: The table below describes nonaggregate window functions that, for each row from a query, perform a calculation using rows related to that row. Most aggregate functions also can be used as window functions, namely AVG(), COUNT(), MAX(), MIN(), and SUM(). The "Docs" link for each function points to that function's reference in the official MySQL docs while the "Tutorial" link for each function points to that function's reference on the MySQL tutorial site. See the official MySQL docs for more detailed information.
| Function | Docs | Tutorial | Description |
|---|---|---|---|
CUME_DIST() | Docs | Tutorial | Cumulative distribution value |
DENSE_RANK() | Docs | Tutorial | Rank of current row within its partition, without gaps |
FIRST_VALUE() | Docs | Tutorial | Value of argument from first row of window frame |
LAG() | Docs | Tutorial | Value of argument from row lagging current row within partition |
LAST_VALUE() | Docs | Tutorial | Value of argument from last row of window frame |
LEAD() | Docs | Tutorial | Value of argument from row leading current row within partition |
NTH_VALUE() | Docs | Tutorial | Value of argument from N-th row of window frame |
NTILE() | Docs | Tutorial | Bucket number of current row within its partition. |
PERCENT_RANK() | Docs | Tutorial | Percentage rank value |
RANK() | Docs | Tutorial | Rank of current row within its partition, with gaps |
ROW_NUMBER() | Docs | Tutorial | Number of current row within its partition |
Window Function Problem Reference: A variety of problems appear below for which a window function was used as part of the answer. The window function being used/emphasized is highlighted as a bullet point, the problem may be accessed by clicking on the problem title, and a solution to the problem (to highlight how the window function is being used) may be seen by clicking to expand the widget.
The listing below is not comprehensive. Many of the solutions revealed on click also do not comply with the style guidelines highlighted earlier in this study guide (remember, these were all written over the span of 2 weeks). When time permits, I may revisit these solutions and try to clean them up.
AVG()
SELECT
DISTINCT DATE_FORMAT(X.pay_date, "%Y-%m") AS pay_month,
X.department_id,
(
CASE
WHEN X.sal_comp = 0 THEN "same"
WHEN X.sal_comp < 0 THEN "lower"
WHEN X.sal_comp > 0 THEN "higher"
END
) AS comparison
FROM
(
SELECT
S.pay_date,
E.department_id,
AVG(S.amount) OVER (PARTITION BY MONTH(S.pay_date), E.department_id)
- AVG(S.amount) OVER (PARTITION BY MONTH(S.pay_date)) AS sal_comp
FROM
salary S
INNER JOIN employee E ON S.employee_id = E.employee_id
) X;COUNT()
SELECT
E.employee_id,
COUNT(E.team_id) OVER(PARTITION BY E.team_id) AS team_size
FROM
Employee E
ORDER BY
E.employee_id;DENSE_RANK()
SELECT
X.Department,
X.Employee,
X.Salary
FROM
(
SELECT
D.Name AS Department,
E.Name AS Employee,
E.Salary,
DENSE_RANK() OVER (
PARTITION BY D.Name
ORDER BY
E.Salary DESC
) AS dep_sal_rank
FROM
Employee E
INNER JOIN Department D ON E.DepartmentId = D.Id
) X
WHERE
X.dep_sal_rank = 1;SELECT
X.Department,
X.Employee,
X.Salary
FROM
(
SELECT
D.Name AS Department,
E.Name AS Employee,
E.Salary,
DENSE_RANK() OVER (
PARTITION BY D.Name
ORDER BY
E.Salary DESC
) AS dep_sal_rank
FROM
Employee E
INNER JOIN Department D ON E.DepartmentId = D.Id
) X
WHERE
X.dep_sal_rank <= 3;WITH non_distinct_salaries AS (
SELECT salary FROM Employees GROUP BY salary HAVING COUNT(*) > 1
)
SELECT
E.employee_id,
E.name,
E.salary,
DENSE_RANK() OVER(ORDER BY E.salary) AS team_id
FROM
Employees E
WHERE
E.salary IN (SELECT * FROM non_distinct_salaries)
ORDER BY
team_id, E.employee_id;LAG()
WITH visit_gaps AS (
SELECT
UV.user_id,
DATEDIFF(
LAG(UV.visit_date, 1, '2021-01-01') OVER(
PARTITION BY UV.user_id ORDER BY UV.visit_date DESC
),
UV.visit_date
)
AS visit_gap
FROM
UserVisits UV
)
SELECT
V.user_id,
MAX(V.visit_gap) AS biggest_window
FROM
visit_gaps V
GROUP BY
V.user_id
ORDER BY
V.user_id;LEAD()
WITH medal_winners AS (
SELECT
U.name,
U.mail,
(CASE WHEN
LEAD(C.contest_id, 2) OVER(PARTITION BY U.user_id ORDER BY C.contest_id)
- C.contest_id = 2 THEN 1 ELSE 0
END) AS three_consec,
(CASE WHEN U.user_id = C.gold_medal THEN 1 ELSE 0 END) AS gold_winner
FROM
Users U
INNER JOIN Contests C ON U.user_id IN (C.gold_medal, C.silver_medal, C.bronze_medal)
)
SELECT
MW.name,
MW.mail
FROM
medal_winners MW
GROUP BY
MW.name, MW.mail
HAVING
SUM(MW.gold_winner) >= 3
OR SUM(MW.three_consec) >= 1;MAX()
SELECT DISTINCT
X.product_id,
X.product_name
FROM
(
SELECT
P.product_id,
P.product_name,
MIN(S.sale_date) OVER(W) AS min_sale_date,
MAX(S.sale_date) OVER(W) AS max_sale_date
FROM
Sales S
INNER JOIN Product P on S.product_id = P.product_id
WINDOW W AS (PARTITION BY P.product_id)
) X
WHERE
X.min_sale_date >= '2019-01-01'
AND X.max_sale_date <= '2019-03-31';WITH order_aggregates AS (
SELECT
O.order_id,
AVG(O.quantity) AS avg_quant,
MAX(O.quantity) AS max_quant,
MAX(AVG(O.quantity)) OVER() AS max_avg_quant
FROM
OrdersDetails O
GROUP BY
O.order_id
)
SELECT
OA.order_id
FROM
order_aggregates OA
WHERE
OA.max_quant > OA.max_avg_quant;MIN()
SELECT DISTINCT
X.product_id,
X.product_name
FROM
(
SELECT
P.product_id,
P.product_name,
MIN(S.sale_date) OVER(W) AS min_sale_date,
MAX(S.sale_date) OVER(W) AS max_sale_date
FROM
Sales S
INNER JOIN Product P on S.product_id = P.product_id
WINDOW W AS (PARTITION BY P.product_id)
) X
WHERE
X.min_sale_date >= '2019-01-01'
AND X.max_sale_date <= '2019-03-31';RANK()
SELECT
X.player_id,
X.device_id
FROM
(
SELECT
A.*,
RANK() OVER (
PARTITION BY A.player_id
ORDER BY
A.event_date
) device_login_order
FROM
Activity A
) X
WHERE
X.device_login_order = 1;WITH customer_order_totals AS (
SELECT
O.customer_number,
COUNT(*) AS total_orders
FROM
orders O
GROUP BY
O.customer_number
), ranked_customer_orders AS (
SELECT
C.customer_number,
RANK() OVER(ORDER BY C.total_orders DESC) AS ranking
FROM
customer_order_totals C
)
SELECT
R.customer_number
FROM
ranked_customer_orders R
WHERE
R.ranking = 1;WITH ranked_sales_by_year AS (
SELECT
S.*,
RANK() OVER(
PARTITION BY S.product_id ORDER BY S.year
) year_ranking
FROM
Sales S
)
SELECT
R.product_id,
R.year AS first_year,
R.quantity,
R.price
FROM
ranked_sales_by_year R
WHERE
R.year_ranking = 1;SELECT
X.project_id
FROM
(
SELECT
P.project_id,
RANK() OVER(ORDER BY COUNT(*) DESC) AS project_ranking
FROM
Project P
INNER JOIN Employee E ON P.employee_id = E.employee_id
GROUP BY
P.project_id
) X
WHERE
X.project_ranking = 1;SELECT
X.project_id,
X.employee_id
FROM
(
SELECT
P.project_id,
E.employee_id,
RANK() OVER(
PARTITION BY P.project_id
ORDER BY
E.experience_years DESC
) AS years_rank
FROM
Project P
INNER JOIN Employee E on P.employee_id = E.employee_id
) X
WHERE
X.years_rank = 1;SELECT
X.seller_id
FROM
(
SELECT
S.seller_id,
RANK() OVER(ORDER BY SUM(S.price) DESC) seller_ranking
FROM
Sales S
GROUP BY
S.seller_id
) X
WHERE
X.seller_ranking = 1;WITH student_rankings AS (
SELECT
E.student_id,
E.course_id,
E.grade,
RANK() OVER(
PARTITION BY E.student_id
ORDER BY E.grade DESC, E.course_id
) AS ranking
FROM
Enrollments E
)
SELECT
S.student_id,
S.course_id,
S.grade
FROM
student_rankings S
WHERE
S.ranking = 1
ORDER BY
S.student_id;ROW_NUMBER()
WITH grps AS (
SELECT
Num,
ROW_NUMBER() OVER (
ORDER BY
Id
) - ROW_NUMBER() OVER (
PARTITION BY Num
ORDER BY
Id
) AS grp
FROM
Logs
)
SELECT
DISTINCT Num AS ConsecutiveNums
FROM
grps
GROUP BY
grp,
Num
HAVING
COUNT(grp) >= 3
ORDER BY
Num;WITH grps AS (
SELECT
S.*,
S.id - ROW_NUMBER() OVER (ORDER BY S.id) AS grp
FROM
stadium S
WHERE
S.people >= 100
), grp_freqs AS (
SELECT
G.*,
COUNT(*) OVER(PARTITION BY G.grp) AS grp_freq
FROM
grps G
)
SELECT
GF.id,
GF.visit_date,
GF.people
FROM
grp_freqs GF
WHERE
GF.grp_freq >= 3
ORDER BY
GF.visit_date;SELECT
MAX((CASE WHEN X.continent = 'America' THEN X.name ELSE NULL END)) AS "America",
MAX((CASE WHEN X.continent = 'Asia' THEN X.name ELSE NULL END)) AS "Asia",
MAX((CASE WHEN X.continent = 'Europe' THEN X.name ELSE NULL END)) AS "Europe"
FROM
(
SELECT
S.continent,
S.name,
ROW_NUMBER() OVER(PARTITION BY S.continent ORDER BY S.name) AS rn
FROM
student S
) X
GROUP BY
X.rn;SUM()
SELECT
A.player_id,
A.event_date,
SUM(A.games_played) OVER (
PARTITION BY
A.player_id
ORDER BY
A.event_date
) AS games_played_so_far
FROM
Activity A;WITH RECURSIVE months_range AS (
SELECT 1 AS month_num
UNION ALL
SELECT month_num + 1 FROM months_range WHERE month_num < 12
), emps_and_months AS (
SELECT
*
FROM
(SELECT DISTINCT Id FROM Employee) X, months_range
), cum_sal_info AS (
SELECT
EM.Id,
EM.month_num AS Month,
E.Salary,
SUM(E.Salary) OVER (
PARTITION BY EM.Id
ORDER BY
EM.month_num ROWS BETWEEN 2 PRECEDING
AND 0 FOLLOWING
) AS cum_sal
FROM
emps_and_months EM
LEFT JOIN Employee E
ON EM.Id = E.Id AND EM.month_num = E.Month
)
SELECT
C.Id,
C.Month,
C.cum_sal AS Salary
FROM
cum_sal_info C
INNER JOIN Employee E1
ON E1.Id = C.Id
AND E1.Month = C.Month
AND C.Month < (
SELECT
MAX(E2.Month)
FROM
Employee E2
WHERE
E2.Id = E1.Id
)
ORDER BY
C.Id,
C.Month DESC;SELECT
S.gender,
S.day,
SUM(S.score_points) OVER(
PARTITION BY S.gender ORDER BY S.day
) AS total
FROM
Scores S
ORDER BY
S.gender, S.day;