Recursive Queries in SQL
SQL's powerful feature of recursive queries allows you to work with
hierarchical data.
Examples of hierarchical data are threaded comments, organizational
structures, or other nested relationships.
Setting Up the Database Schema
First, we need to create a table to hold our comments. Each comment can
either be a standalone comment or a reply to another comment. To represent
this, we use a `ParentId` column that refers back to the `Id` of the
comment it replies to.
CREATE TABLE Comment (
Id INT PRIMARY KEY,
Content TEXT,
ParentId INT
);
The `Comment` table has three columns:
- Id: A unique identifier for each comment.
- Content: The actual text of the comment.
-
ParentId: A reference to the `Id` of the comment that
this comment is replying to. If the `ParentId` is `NULL`, it means this
is a top-level comment.
Inserting Sample Data
Next, we insert some sample data into the `Comment` table. This data
represents a threaded comment structure, where some comments are replies
to others:
INSERT INTO Comment (Id, Content, ParentId) VALUES (0, 'Python', NULL);
INSERT INTO Comment (Id, Content, ParentId) VALUES (1, 'Hello', NULL);
INSERT INTO Comment (Id, Content, ParentId) VALUES (2, 'World', 1);
INSERT INTO Comment (Id, Content, ParentId) VALUES (3, 'Hi', 2);
INSERT INTO Comment (Id, Content, ParentId) VALUES (4, 'Sql', NULL);
INSERT INTO Comment (Id, Content, ParentId) VALUES (5, 'Recursive', 4);
This data creates a comment structure like the following:
Using Recursive CTEs to Query Hierarchical Data
Now, let’s write a query to retrieve and display this hierarchical comment
structure. We’ll use a Common Table Expression (CTE) with recursion:
WITH RECURSIVE RecursiveComment AS (
SELECT Id, Content, ParentId, 1 AS Level
FROM Comment
WHERE ParentId IS NULL
UNION ALL
SELECT c.Id, c.Content, c.ParentId, rc.Level + 1
FROM Comment c
INNER JOIN RecursiveComment rc ON c.ParentId = rc.Id
)
SELECT * FROM RecursiveComment ORDER BY Level, ParentId;
Let’s break this down:
-
WITH RECURSIVE RecursiveComment AS (
This line begins the definition of a recursive CTE named
`RecursiveComment`. CTEs allow you to create temporary result sets that
you can reference within a `SELECT`, `INSERT`, `UPDATE`, or `DELETE`
statement.
-
SELECT Id, Content, ParentId, 1 AS Level FROM Comment WHERE ParentId
IS NULL
The first `SELECT` statement inside the CTE is called the anchor member.
It selects all comments that do not have a parent (i.e., top-level
comments) and assigns them a `Level` of 1.
-
UNION ALL
The `UNION ALL` operator combines the result of the anchor member with
the recursive member, which we define next. This operator is crucial in
recursive CTEs because it ensures that all rows are included, even
duplicates if they occur.
-
SELECT c.Id, c.Content, c.ParentId, rc.Level + 1 FROM Comment c INNER
JOIN RecursiveComment rc ON c.ParentId = rc.Id
The recursive member of the CTE. Here, for each row in
`RecursiveComment` (which initially contains only the top-level
comments), we join the `Comment` table again to find any replies
(children) to these comments. For each child comment found, we increase
its `Level` by 1.
-
SELECT * FROM RecursiveComment ORDER BY Level, ParentId;
Finally, the CTE is terminated, and we perform a `SELECT` on the
`RecursiveComment` CTE to get the hierarchical comment structure. The
results are ordered by `Level` and `ParentId` to maintain the
hierarchical display.
Output
| Id |
Content |
ParentId |
Level |
| 0 |
Python |
null |
1 |
| 1 |
Hello |
null |
1 |
| 4 |
Sql |
null |
1 |
| 2 |
World |
1 |
2 |
| 5 |
Recursive |
4 |
2 |
| 3 |
Hi |
2 |
3 |