Joins, Windows & CTEs
Go beyond basic queries. Master multi-table joins, window functions for ranking and running totals, and common table expressions for readable SQL.
Once you are comfortable with single-table queries, the next step is combining data from multiple sources, computing over ordered partitions, and structuring complex logic into readable building blocks. This guide covers the three pillars of intermediate-to-advanced SQL: joins, window functions, and common table expressions (CTEs).
Setting Up the Tables
We will work with a small company dataset: employees, departments, and project assignments. Creating these tables in TeideDB is straightforward with INSERT INTO statements. This gives us relational data with natural foreign-key relationships to explore join behavior.
import { Context } from 'teide-js';
const ctx = new Context();
ctx.executeSync(`
CREATE TABLE departments (
dept_id INT,
dept_name TEXT,
budget DOUBLE
)
`);
ctx.executeSync(`
INSERT INTO departments VALUES
(1, 'Engineering', 500000),
(2, 'Marketing', 200000),
(3, 'Sales', 300000),
(4, 'Research', 450000)
`);
ctx.executeSync(`
CREATE TABLE employees (
emp_id INT,
name TEXT,
dept_id INT,
salary DOUBLE,
hire_date TEXT
)
`);
ctx.executeSync(`
INSERT INTO employees VALUES
(101, 'Alice', 1, 120000, '2020-03-15'),
(102, 'Bob', 1, 95000, '2021-07-01'),
(103, 'Carol', 2, 85000, '2019-11-20'),
(104, 'Dave', 3, 110000, '2018-06-10'),
(105, 'Eve', 1, 105000, '2022-01-05'),
(106, 'Frank', 3, 90000, '2021-09-12'),
(107, 'Grace', 2, 78000, '2023-02-28'),
(108, 'Hank', NULL, 72000, '2023-05-15')
`);
ctx.executeSync(`
CREATE TABLE projects (
project_id INT,
project_name TEXT,
emp_id INT,
hours INT
)
`);
ctx.executeSync(`
INSERT INTO projects VALUES
(1, 'Alpha', 101, 120),
(2, 'Alpha', 102, 80),
(3, 'Beta', 101, 40),
(4, 'Beta', 105, 90),
(5, 'Gamma', 103, 60),
(6, 'Gamma', 104, 100),
(7, 'Delta', 106, 75)
`);
Multi-Table Joins
Joins combine rows from two or more tables based on a related column. An INNER JOIN returns only rows that have matching values in both tables, while a LEFT JOIN preserves all rows from the left table and fills in NULL for unmatched columns on the right. Understanding which join type to use is key to getting correct results — an inner join silently drops non-matching rows, which may or may not be what you want.
INNER JOIN: employees with their departments
SELECT e.name, d.dept_name, e.salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.salary DESC;
const joined = ctx.executeSync(`
SELECT e.name, d.dept_name, e.salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.salary DESC
`);
const names = joined.getColumn('name').toArray();
const depts = joined.getColumn('dept_name').toArray();
const salaries = joined.getColumn('salary').toArray();
names.forEach((n, i) => {
console.log(` ${n} (${depts[i]}): $${salaries[i].toLocaleString()}`);
});
Note: Hank (emp_id 108) has a NULL dept_id and does not appear in the inner join result. Use a LEFT JOIN if you need to include unmatched employees.
LEFT JOIN: include unassigned employees
SELECT e.name, COALESCE(d.dept_name, 'Unassigned') AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.name;
Triple join: employees, departments, and projects
You can chain multiple joins to bring together three or more tables. This query shows each employee alongside their department and project assignments. Employees without projects still appear thanks to the LEFT JOIN on the projects table.
SELECT
e.name,
d.dept_name,
COALESCE(p.project_name, 'No project') AS project,
COALESCE(p.hours, 0) AS hours
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
LEFT JOIN projects p ON e.emp_id = p.emp_id
ORDER BY e.name, p.project_name;
Window Functions
Window functions perform calculations across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY does. They are invaluable for ranking, running totals, and comparisons within groups. The OVER clause defines the window — which rows to include and in what order.
ROW_NUMBER: ranking within partitions
Rank employees by salary within each department. ROW_NUMBER() assigns a sequential integer to each row within its partition, ordered by the specified column. This is perfect for "top-N per group" queries.
SELECT
name,
dept_name,
salary,
ROW_NUMBER() OVER (
PARTITION BY d.dept_id
ORDER BY salary DESC
) AS dept_rank
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, dept_rank;
Running SUM: cumulative totals
A running sum computes a cumulative total as you move through ordered rows. By specifying ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, each row's value includes all previous rows plus itself. This is useful for tracking cumulative budget spend, progressive revenue, or running balances.
SELECT
name,
salary,
SUM(salary) OVER (
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_payroll
FROM employees e
WHERE e.dept_id IS NOT NULL
ORDER BY hire_date;
DENSE_RANK: handling ties
Unlike ROW_NUMBER, DENSE_RANK assigns the same rank to tied values and does not skip subsequent ranks. This is useful when multiple items may share the same score or value and you want ranks like 1, 1, 2 instead of 1, 2, 3.
SELECT
project_name,
SUM(hours) AS total_hours,
DENSE_RANK() OVER (ORDER BY SUM(hours) DESC) AS effort_rank
FROM projects
GROUP BY project_name
ORDER BY effort_rank;
Common Table Expressions (CTEs)
CTEs let you define named temporary result sets at the top of a query using the WITH clause. They make complex queries more readable by breaking them into logical steps, each with a descriptive name. Think of them as inline views that exist only for the duration of the query. CTEs can reference each other, enabling a pipeline-style composition.
WITH dept_costs AS (
SELECT
d.dept_name,
d.budget,
SUM(e.salary) AS total_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name, d.budget
),
utilization AS (
SELECT
dept_name,
budget,
total_salary,
ROUND(total_salary / budget * 100, 1) AS pct_used
FROM dept_costs
)
SELECT * FROM utilization ORDER BY pct_used DESC;
const budget = ctx.executeSync(`
WITH dept_costs AS (
SELECT d.dept_name, d.budget, SUM(e.salary) AS total_salary
FROM departments d
INNER JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name, d.budget
),
utilization AS (
SELECT dept_name, budget, total_salary,
ROUND(total_salary / budget * 100, 1) AS pct_used
FROM dept_costs
)
SELECT * FROM utilization ORDER BY pct_used DESC
`);
const deptNames = budget.getColumn('dept_name').toArray();
const pctUsed = budget.getColumn('pct_used').toArray();
deptNames.forEach((d, i) => {
console.log(` ${d}: ${pctUsed[i]}% of budget used`);
});
Subqueries
Subqueries embed one query inside another, either in the WHERE clause (for filtering) or the FROM clause (as derived tables). They are useful when you need to compute an intermediate result and immediately use it. The IN operator with a subquery is a common pattern for filtering rows against a dynamically computed set of values.
IN subquery: employees on a specific project
SELECT name, salary
FROM employees
WHERE emp_id IN (
SELECT emp_id FROM projects WHERE project_name = 'Alpha'
)
ORDER BY salary DESC;
Derived table: average hours per department
SELECT d.dept_name, ph.avg_hours
FROM (
SELECT e.dept_id, AVG(p.hours) AS avg_hours
FROM projects p
INNER JOIN employees e ON p.emp_id = e.emp_id
GROUP BY e.dept_id
) ph
INNER JOIN departments d ON ph.dept_id = d.dept_id
ORDER BY ph.avg_hours DESC;
Set Operations
UNION ALL combines the results of two or more SELECT statements into a single result set. Unlike UNION (which removes duplicates), UNION ALL preserves all rows, making it faster for cases where duplicates are acceptable or impossible. This is useful for combining data from different sources or creating summary reports with multiple levels of detail.
SELECT 'High earner' AS label, name, salary
FROM employees WHERE salary >= 100000
UNION ALL
SELECT 'Standard' AS label, name, salary
FROM employees WHERE salary < 100000 AND dept_id IS NOT NULL
ORDER BY salary DESC;
Putting It All Together
Here is a query that combines everything: a CTE computes project effort per employee, a join brings in department info, and a window function ranks each employee's total project hours within their department. This kind of layered query is the bread and butter of real-world analytics.
WITH effort AS (
SELECT
emp_id,
SUM(hours) AS total_hours,
COUNT(*) AS num_projects
FROM projects
GROUP BY emp_id
)
SELECT
e.name,
d.dept_name,
ef.num_projects,
ef.total_hours,
DENSE_RANK() OVER (
PARTITION BY d.dept_id
ORDER BY ef.total_hours DESC
) AS dept_effort_rank
FROM effort ef
INNER JOIN employees e ON ef.emp_id = e.emp_id
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, dept_effort_rank;
const combined = ctx.executeSync(`
WITH effort AS (
SELECT emp_id, SUM(hours) AS total_hours, COUNT(*) AS num_projects
FROM projects
GROUP BY emp_id
)
SELECT
e.name, d.dept_name, ef.num_projects, ef.total_hours,
DENSE_RANK() OVER (
PARTITION BY d.dept_id ORDER BY ef.total_hours DESC
) AS dept_effort_rank
FROM effort ef
INNER JOIN employees e ON ef.emp_id = e.emp_id
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, dept_effort_rank
`);
const cNames = combined.getColumn('name').toArray();
const cDepts = combined.getColumn('dept_name').toArray();
const cHours = combined.getColumn('total_hours').toArray();
const cRanks = combined.getColumn('dept_effort_rank').toArray();
cNames.forEach((n, i) => {
console.log(` ${cDepts[i]} #${cRanks[i]}: ${n} (${cHours[i]}h)`);
});