Introduction
Window functions in SQL are powerful tools that allow you to perform calculations across a set of rows related to the current row, without collapsing the result set like aggregate functions do. They were introduced in SQL:2003 and are supported in most modern relational databases like PostgreSQL, MySQL (from version 8.0), SQL Server, and SQLite. In this blog post, we will dive into what window functions are, why they are important, explore various types with examples, and compare them to traditional aggregate functions.
To illustrate the concepts, we will use a sample `employees` table with the following data:
| emp_id | name | department | salary |
|——–|———|————|——–|
| 1 | Alice | HR | 50000 |
| 2 | Bob | HR | 60000 |
| 3 | Charlie | IT | 70000 |
| 4 | David | IT | 80000 |
| 5 | Eve | Sales | 90000 |
| 6 | Frank | HR | 60000 |
All examples are based on SQL queries run against this table.
What Are Window Functions?
A window function computes a value for each row in a query result based on a “window” of rows defined relative to the current row. The syntax typically involves the `OVER`clause, which specifies the partitioning and ordering of the rows.
The general form is
“`
function_name(arguments) OVER (
[PARTITION BY partition_expression][ORDER BY order_expression][frame_clause])
“`
– PARTITION BY: Divides the rows into groups (partitions) where the function is applied independently.
– ORDER BY: Defines the order within each partition.
– Frame clause (optional): Specifies a subset of the partition, like `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` for running totals.
Unlike subqueries or joins, window functions operate on the result set after the `WHERE` clause but before the final `ORDER BY`.
Why Are Window Functions Important?
Window functions are crucial for advanced data analysis because they enable complex calculations without needing multiple queries or temporary tables. They improve query performance and readability in scenarios like
– Calculating running totals or moving averages (e.g., stock prices over time).
– Ranking items (e.g., top salespeople per region).
– Comparing rows (e.g., year-over-year growth).
– Handling analytics in reporting tools or data warehouses.
They bridge the gap between simple aggregations and procedural programming, making SQL more expressive for analytical tasks.
Types of Window Functions with Examples
Window functions fall into three main categories: ranking, aggregate (used over windows), and value functions. We will cover key ones with queries and results.
1. Ranking Functions
These assign ranks or groups based on ordering.
ROW_NUMBER()
Assigns a unique sequential number to each row in the ordered partition.
Query:
“`
SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM employees;
“`
Result:
| name | salary | row_num |
|———|——–|———|
| Eve | 90000 | 1 |
| David | 80000 | 2 |
| Charlie | 70000 | 3 |
| Bob | 60000 | 4 |
| Frank | 60000 | 5 |
| Alice | 50000 | 6 |
This numbers rows from highest to lowest salary, with no ties considered—each gets a unique number.
RANK()
Assigns the same rank to tied rows and skips subsequent ranks.
Query:
“`
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;
“`
Result:
| name | salary | rank |
|———|——–|——|
| Eve | 90000 | 1 |
| David | 80000 | 2 |
| Charlie | 70000 | 3 |
| Bob | 60000 | 4 |
| Frank | 60000 | 4 |
| Alice | 50000 | 6 |
Bob and Frank tie at rank 4, so Alice gets 6 (skipping 5).
DENSE_RANK()
Similar to RANK(), but doesn’t skip ranks after ties.
Query:
“`
SELECT name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;
“`
Result:
| name | salary | dense_rank |
|———|——–|————|
| Eve | 90000 | 1 |
| David | 80000 | 2 |
| Charlie | 70000 | 3 |
| Bob | 60000 | 4 |
| Frank | 60000 | 4 |
| Alice | 50000 | 5 |
After the tie at 4, Alice gets 5—no skip.
NTILE(n)
Divides rows into `n`approximately equal groups (tiles).
Query (for 2 tiles):
“`
SELECT name, salary, NTILE(2) OVER (ORDER BY salary DESC) as ntile
FROM employees;
“`
Result:
| name | salary | ntile |
|———|——–|——-|
| Eve | 90000 | 1 |
| David | 80000 | 1 |
| Charlie | 70000 | 1 |
| Bob | 60000 | 2 |
| Frank | 60000 | 2 |
| Alice | 50000 | 2 |
Top half in tile 1, bottom in tile 2.
2. Aggregate Functions as Window Functions
Standard aggregates like SUM, AVG, MIN, MAX, and COUNT can be used with OVER() to compute over windows.
SUM() OVER()
Calculates the sum over the partition.
Query:
“`
SELECT name, department, salary, SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
“`
Result:
| name | department | salary | dept_total |
|———|————|——–|————|
| Alice | HR | 50000 | 170000 |
| Bob | HR | 60000 | 170000 |
| Frank | HR | 60000 | 170000 |
| Charlie | IT | 70000 | 150000 |
| David | IT | 80000 | 150000 |
| Eve | Sales | 90000 | 90000 |
Each row shows the total salary for its department.
AVG() OVER()
Computes the average over the partition.
Query:
“`
SELECT name, department, salary, AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;
“`
Result:
| name | department | salary | dept_avg |
|———|————|——–|————|
| Alice | HR | 50000 | 56666.67 |
| Bob | HR | 60000 | 56666.67 |
| Frank | HR | 60000 | 56666.67 |
| Charlie | IT | 70000 | 75000.00 |
| David | IT | 80000 | 75000.00 |
| Eve | Sales | 90000 | 90000.00 |
3. Value Functions
These access values from other rows in the window.
LAG()
Returns the value from a previous row (offset by 1 by default).
Query:
“`
SELECT name, salary, LAG(salary, 1) OVER (ORDER BY salary) as prev_salary
FROM employees;
“`
Result:
| name | salary | prev_salary |
|———|——–|————-|
| Alice | 50000 | NULL |
| Bob | 60000 | 50000 |
| Frank | 60000 | 60000 |
| Charlie | 70000 | 60000 |
| David | 80000 | 70000 |
| Eve | 90000 | 80000 |
Shows the salary of the “previous” row by salary order.
LEAD()
Returns the value from the next row.
Query:
“`
SELECT name, salary, LEAD(salary, 1) OVER (ORDER BY salary) as next_salary
FROM employees;
“`
Result:
| name | salary | next_salary |
|———|——–|————-|
| Alice | 50000 | 60000 |
| Bob | 60000 | 60000 |
| Frank | 60000 | 70000 |
| Charlie | 70000 | 80000 |
| David | 80000 | 90000 |
| Eve | 90000 | NULL |
FIRST_VALUE()
Returns the first value in the ordered partition.
Query:
“`
SELECT name, department, salary, FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) as min_dept_salary
FROM employees;
“`
Result:
| name | department | salary | min_dept_salary |
|———|————|——–|—————–|
| Alice | HR | 50000 | 50000 |
| Bob | HR | 60000 | 50000 |
| Frank | HR | 60000 | 50000 |
| Charlie | IT | 70000 | 70000 |
| David | IT | 80000 | 70000 |
| Eve | Sales | 90000 | 90000 |
This effectively gives the minimum salary per department.
LAST_VALUE()
Returns the last value in the ordered partition (requires a frame clause for a full partition).
Query:
“`
SELECT name, department, salary, LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as max_dept_salary
FROM employees;
“`
Result:
| name | department | salary | max_dept_salary |
|———|————|——–|—————–|
| Alice | HR | 50000 | 60000 |
| Bob | HR | 60000 | 60000 |
| Frank | HR | 60000 | 60000 |
| Charlie | IT | 70000 | 80000 |
| David | IT | 80000 | 80000 |
| Eve | Sales | 90000 | 90000 |
This gives the maximum salary per department.
How Window Functions Differ from Aggregate Functions
Aggregate functions (e.g., SUM, AVG) summarize data by collapsing multiple rows into fewer rows, often requiring `GROUP BY`. They produce one result per group.
Window functions, however, return a result for *every* row in the original query, adding computed columns without reducing the row count. They use `OVER()` instead of `GROUP BY`.
Example Comparison: SUM
Aggregate Function (with GROUP BY):
“`
SELECT department, SUM(salary) as total_salary
FROM employees
GROUP BY department;
“`
Result (3 rows):
| department | total_salary |
|————|————–|
| HR | 170000 |
| IT | 150000 |
| Sales | 90000 |
This group summarizes, losing individual row details.
Window Function:
“`
SELECT emp_id, department, salary, SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
“`
Result (6 rows):
| emp_id | department | salary | dept_total |
|——–|————|——–|————|
| 1 | HR | 50000 | 170000 |
| 2 | HR | 60000 | 170000 |
| 6 | HR | 60000 | 170000 |
| 3 | IT | 70000 | 150000 |
| 4 | IT | 80000 | 150000 |
| 5 | Sales | 90000 | 90000 |
Here, every original row is preserved, with the department total added as a new column.
In summary, aggregates are for summarization, while window functions are for per-row analytics within contexts.
SQL Windows Functions Interview Questions and Answers
Basic Aggregate Questions
1. What are the main aggregate functions in SQL, and what do they do?
– answer: `COUNT()` counts rows/values; `SUM()` adds numerics; `AVG()` averages numerics (ignores NULLs); `MIN()`/`MAX()` find extremes.
2. How does `HAVING` differ from `WHERE`?
– `WHERE` filters rows before aggregation; `HAVING` filters groups after.
Example:
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 3;
3. Write a query to find the average sales per region.
– Example:
SELECT region, AVG(sales) AS avg_sales
FROM sales_data
GROUP BY region
4. How would you find the total revenue by product, but only for products sold more than 10 times?
– Use `GROUP BY` with `HAVING COUNT(*) > 10
5. Calculate the maximum order amount per customer and date, filtering for amounts > $1000.
– Example:
SELECT customer_id, order_date, MAX(amount)
FROM orders
GROUP BY customer_id, order_date
HAVING MAX(amount) IN (2000, 3000, 5760, 6000);
6. What’s the difference between `COUNT(*)` and `COUNT(column)`?
– `COUNT(*)` counts all rows; `COUNT(column)` skips NULLs.
7. Find the most frequent customer (highest order count).
– Example:
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC
LIMIT 1;
8. How do you handle text aggregation, like concatenating names?
– Use `GROUP_CONCAT()` (MySQL) or `STRING_AGG()` (PostgreSQL).
9. Query the date with the highest number of orders.
– Example:
SELECT order_date, COUNT(*) AS order_count
FROM orders
GROUP BY order_date
ORDER BY order_count DESC
LIMIT 1;
10. Find departments with average salary above company average.
SELECT department
FROM employees
GROUP BY department
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
Window Functions: Key Concepts and Questions
Window functions use `OVER()` to compute across “windows” of rows, preserving the result set. Categories: ranking (`ROW_NUMBER()`, `RANK()`), aggregates over windows (`SUM() OVER()`), and value (`LAG()`, `LEAD()`). Questions often test `PARTITION BY`, `ORDER BY`, and frames.
Basic Window Questions
1. What is a window function, and how does it differ from aggregates?
– Window: Calculates per row over a related set (e.g., running total); preserves rows. Aggregate: Collapses to one/group value via `GROUP BY`.
2. Explain the `OVER()` clause components.
– `PARTITION BY`: Groups like GROUP BY; `ORDER BY`: Sorts within partition; frame (e.g., `ROWS BETWEEN`): Defines subset.
*Tip*: Default frame is `RANGE UNBOUNDED PRECEDING TO CURRENT ROW`
3. What does `ROW_NUMBER()` do?
– Assigns unique sequential numbers per partition, ordered.
Example:
SELECT name, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees;
4. Difference between `RANK()`, `DENSE_RANK()`, and `ROW_NUMBER()`?
– `RANK()`: If two or more rows have the same rank number, then skips next (for example:- -1,1,3).
– `DENSE_RANK()`: If two or more rows have the same rank, then it does not skip rank number for the next rows. , no skip (1,1,2).
– `ROW_NUMBER()`: It returns unique number for each rows.
For more details read above blog post
5. Find the second-highest salary per department.
– Example:
WITH ranked AS (
SELECT dept, salary, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT * FROM ranked WHERE rnk = 2;
6. What is `NTILE(n)`? Give an example.
– Divides rows into n buckets (e.g., quartiles).
Example: `NTILE(4) OVER (ORDER BY sales DESC)`, this query divides rows into four quarters.
7. Use `LAG()` or `LEAD()` for year-over-year growth.
– Example (YoY %):
SELECT year, revenue,
(revenue – LAG(revenue) OVER (ORDER BY year)) / LAG(revenue) OVER (ORDER BY year) * 100 AS growth_pct
FROM yearly_sales;
8. Calculate the running total sales.
– Example:
SELECT date, sales, SUM(sales) OVER (ORDER BY date) AS running_total
FROM daily_sales;
9. Find Top 2 products per category in 2022.
– Example:
WITH ranked AS (
SELECT category, product, revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
FROM sales WHERE year = 2022
)
SELECT * FROM ranked WHERE rn <= 2;
10. Find the most recent balance per user for ‘Refinance’ submissions.
– Example:
SELECT DISTINCT user_id, FIRST_VALUE(balance) OVER (PARTITION BY user_id ORDER BY date DESC) AS latest_balance
FROM submissions WHERE type = ‘Refinance’;
Conclusion
Window functions unlock advanced SQL capabilities, making it easier to handle analytical queries efficiently. By mastering them, you can write more concise and powerful code. Experiment with your own datasets to see their full potential.
happy querying!