Mastering Common Table Expressions (CTEs) in SQL: A Beginner’s Guide with Examples
Hello, SQL enthusiasts! If you have been working with SQL and found yourself wrestling with nested subqueries or repetitive code, Common Table Expressions (CTEs) might just be your new best friend.
In this post, will dive deep into what CTEs are, provide practical examples, discuss when to use them, and compare them to subqueries. Whether you are a beginner or brushing up on intermediate skills, this guide will help you level up your query game. Let’s jump in!
What Are Common Table Expressions (CTEs) in SQL?
A Common Table Expression (CTE) is a temporary named result set that you can define within a SQL statement and reference later in that same statement. Think of it as a way to create a “virtual table” that’s only available during the execution of your query; it’s not stored permanently in the database.
CTEs are introduced using the `WITH` clause, followed by the CTE name, the `AS` keyword, and the subquery in parentheses. The main query then follows, where you can treat the CTE like a regular table.
Key Features of CTEs:
– Temporary and Scoped: Exists only for the duration of the query.
– Reusable: Can be referenced multiple times in the same query, avoiding code duplication.
– Readable: Breaks down complex queries into logical, named parts.
– Recursive: Supports self-referencing for handling hierarchical data (more on this later).
– Supported Databases: Widely available in PostgreSQL, SQL Server, MySQL (from version 8.0), Oracle, and others.
CTEs are particularly useful in SELECT, INSERT, UPDATE, DELETE, or MERGE statements. Unlike views or temporary tables, CTEs don’t require creation privileges and are purely query-scoped.
Basic Examples of CTEs in SQL
Let’s use a sample database with two tables: `employees` (id, name, salary, dept_id, manager_id) and `departments` (id, name).
Example 1: Simple CTE for Filtering
Suppose you want to find employees earning above the average salary in their department. Without a CTE, you’d use a subquery. With a CTE, it’s more organized:
WITH DeptAvg AS (
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
)
SELECT e.name, e.salary, d.avg_salary
FROM employees e
JOIN DeptAvg d ON e.dept_id = d.dept_id
WHERE e.salary > d.avg_salary;
– The CTE `DeptAvg` computes averages per department.
– The main query joins it back to `employees` for filtering.
– This is cleaner than nesting the average calculation directly in the WHERE clause.
Example 2: Multiple CTEs in One Query
You can chain multiple CTEs, separating them with commas. Here’s one to get high earners and department counts:
WITH HighEarners AS (
SELECT id, name, salary, dept_id
FROM employees
WHERE salary > 80000
),
DeptCounts AS (
SELECT dept_id, COUNT(*) AS employee_count
FROM employees
GROUP BY dept_id
)
SELECT h.name, h.salary, dc.employee_count
FROM HighEarners h
JOIN DeptCounts dc ON h.dept_id = dc.dept_id
JOIN departments d ON h.dept_id = d.id;
– `HighEarners` filters top salaries.
– `DeptCounts` aggregates counts.
– The main query combines them, making the logic easy to follow.
Example 3: Recursive CTE for Hierarchical Data
Recursive CTEs are a superpower for trees or graphs, like employee hierarchies. They have a base case and a recursive part, joined with UNION ALL.
WITH EmployeeHierarchy AS (
— Base case: Top-level managers (no manager)
SELECT id, name, manager_id, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
— Recursive case: Employees reporting to previous level
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN EmployeeHierarchy eh ON e.manager_id = eh.id
)
SELECT name, level
FROM EmployeeHierarchy
ORDER BY level;
– Starts with CEOs (manager_id NULL).
– Recursively adds direct reports, incrementing the level.
– Output: A list of employees with their hierarchy depth.
– Note: Limit recursion depth if needed (e.g., with MAXRECURSION in SQL Server) to avoid infinite loops.
When to Use Common Table Expressions in SQL?
CTEs shine in scenarios where readability, modularity, and reuse are key. Here’s when to reach for them:
1. Complex Queries with Multiple Steps: Break down intricate logic into named blocks. For example, when you have aggregations, filters, and joins that would otherwise create a messy nested query.
2. Reusing Subquery Results: If the same subquery is needed in multiple places (e.g., in SELECT and WHERE), a CTE avoids repetition, reducing errors and improving maintenance.
3. Hierarchical or Recursive Data: Essential for trees like organizational charts, bill of materials, or graph traversals. Subqueries can’t handle recursion easily.
4. Improving Readability and Debugging: CTEs make queries self-documenting. You can test each CTE independently by selecting from it.
5. Alternatives to Views or Temp Tables: When you don’t want persistent objects or lack permissions, CTEs provide a query-local solution.
6. Window Functions or Advanced Analytics:
Often paired with ROW_NUMBER(), RANK(), or other functions for paging or deduplication.
When NOT to Use CTEs?
– For simple queries: A basic subquery or join might suffice and be faster.
– Performance-Critical Scenarios: Some databases materialize CTEs (store results temporarily), which can use more memory. Test and profile!
– If your database doesn’t support them (e.g., older MySQL versions).
In essence, use CTEs when your query feels like a novel; divide it into chapters for better storytelling.
Are Common Table Expressions Better Than Subqueries?
The short answer: It depends! CTEs and subqueries solve similar problems but differ in structure, readability, and performance. Neither is universally “better”—choose based on context.
Pros of CTEs Over Subqueries:
– Readability: CTEs separate concerns, making code easier to understand and modify. Subqueries can lead to “nesting hell” with deep parentheses.
– Reusability: Reference a CTE multiple times without rewriting. Subqueries must be duplicated, increasing code bloat.
– Recursion Support: CTEs handle recursive queries natively; subqueries don’t.
– Modularity: Great for team collaboration; each CTE can be developed/tested separately.
– Self-Joins and Complex Logic: Cleaner for scenarios like comparing rows within the same table.
Pros of Subqueries Over CTEs:
– Simplicity for Small Tasks: Inline subqueries are quicker for one-off filters (e.g., a single average comparison).
– Performance: In some optimizers (e.g., PostgreSQL), subqueries might inline better, avoiding materialization. CTEs can sometimes create temporary tables, using more resources.
– No Overhead: Subqueries don’t introduce a new scope or name, keeping things lightweight.
Performance Comparison:
– Materialization: CTEs may be materialized (results stored temporarily) in databases like SQL Server, which helps for multiple references but hurts for large datasets. Subqueries are often correlated or inlined.
– Execution: Test with EXPLAIN or query plans. For example, a CTE referenced once might perform like a subquery, but multiple references favor CTEs.
– Edge Cases: Recursive CTEs have no subquery equivalent. For non-recursive, many queries can be rewritten either way.
Example Rewrite: CTE vs Subquery
Task: Employees above company average.
Subquery Version:
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
CTE Version:
WITH CompanyAvg AS (
SELECT AVG(salary) AS avg_salary
FROM employees
)
SELECT name, salary
FROM employees e
JOIN CompanyAvg c ON e.salary > c.avg_salary;
– The subquery is simpler here.
– But if you needed the average in multiple clauses, the CTE wins.
Verdict: CTEs are often “better” for complex, reusable, or recursive queries due to readability and structure. Subqueries excel in simplicity for straightforward tasks. In modern SQL, use CTEs liberally for maintainable code—optimize later if needed.
Final Thoughts
CTEs are a fantastic tool for writing elegant, efficient SQL. Start practicing with simple ones, then tackle recursion. Tools like DB Fiddle or your local database setup are great for experimentation. If you’re dealing with massive data, always check performance.
