GUIDES

Joins, Windows & CTEs

Go beyond basic queries. Master multi-table joins, window functions for ranking and running totals, and common table expressions for readable SQL.

Once you are comfortable with single-table queries, the next step is combining data from multiple sources, computing over ordered partitions, and structuring complex logic into readable building blocks. This guide covers the three pillars of intermediate-to-advanced SQL: joins, window functions, and common table expressions (CTEs).

Setting Up the Tables

We will work with a small company dataset: employees, departments, and project assignments. Creating these tables in TeideDB is straightforward with INSERT INTO statements. This gives us relational data with natural foreign-key relationships to explore join behavior.

import { Context } from 'teide-js';

const ctx = new Context();

ctx.executeSync(`
  CREATE TABLE departments (
    dept_id   INT,
    dept_name TEXT,
    budget    DOUBLE
  )
`);

ctx.executeSync(`
  INSERT INTO departments VALUES
    (1, 'Engineering', 500000),
    (2, 'Marketing',   200000),
    (3, 'Sales',       300000),
    (4, 'Research',    450000)
`);

ctx.executeSync(`
  CREATE TABLE employees (
    emp_id    INT,
    name      TEXT,
    dept_id   INT,
    salary    DOUBLE,
    hire_date TEXT
  )
`);

ctx.executeSync(`
  INSERT INTO employees VALUES
    (101, 'Alice',   1, 120000, '2020-03-15'),
    (102, 'Bob',     1,  95000, '2021-07-01'),
    (103, 'Carol',   2,  85000, '2019-11-20'),
    (104, 'Dave',    3, 110000, '2018-06-10'),
    (105, 'Eve',     1, 105000, '2022-01-05'),
    (106, 'Frank',   3,  90000, '2021-09-12'),
    (107, 'Grace',   2,  78000, '2023-02-28'),
    (108, 'Hank',    NULL, 72000, '2023-05-15')
`);

ctx.executeSync(`
  CREATE TABLE projects (
    project_id   INT,
    project_name TEXT,
    emp_id       INT,
    hours        INT
  )
`);

ctx.executeSync(`
  INSERT INTO projects VALUES
    (1, 'Alpha',   101, 120),
    (2, 'Alpha',   102,  80),
    (3, 'Beta',    101,  40),
    (4, 'Beta',    105,  90),
    (5, 'Gamma',   103,  60),
    (6, 'Gamma',   104, 100),
    (7, 'Delta',   106,  75)
`);

Multi-Table Joins

Joins combine rows from two or more tables based on a related column. An INNER JOIN returns only rows that have matching values in both tables, while a LEFT JOIN preserves all rows from the left table and fills in NULL for unmatched columns on the right. Understanding which join type to use is key to getting correct results — an inner join silently drops non-matching rows, which may or may not be what you want.

INNER JOIN: employees with their departments

SELECT e.name, d.dept_name, e.salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.salary DESC;

const joined = ctx.executeSync(`
  SELECT e.name, d.dept_name, e.salary
  FROM employees e
  INNER JOIN departments d ON e.dept_id = d.dept_id
  ORDER BY e.salary DESC
`);

const names = joined.getColumn('name').toArray();
const depts = joined.getColumn('dept_name').toArray();
const salaries = joined.getColumn('salary').toArray();

names.forEach((n, i) => {
  console.log(`  ${n} (${depts[i]}): $${salaries[i].toLocaleString()}`);
});

Alice (Engineering): $120,000 Dave (Sales): $110,000 Eve (Engineering): $105,000 Bob (Engineering): $95,000 Frank (Sales): $90,000 Carol (Marketing): $85,000 Grace (Marketing): $78,000

Note: Hank (emp_id 108) has a NULL dept_id and does not appear in the inner join result. Use a LEFT JOIN if you need to include unmatched employees.

LEFT JOIN: include unassigned employees

SELECT e.name, COALESCE(d.dept_name, 'Unassigned') AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
ORDER BY e.name;

Alice Engineering Bob Engineering Carol Marketing Dave Sales Eve Engineering Frank Sales Grace Marketing Hank Unassigned

Triple join: employees, departments, and projects

You can chain multiple joins to bring together three or more tables. This query shows each employee alongside their department and project assignments. Employees without projects still appear thanks to the LEFT JOIN on the projects table.

SELECT
  e.name,
  d.dept_name,
  COALESCE(p.project_name, 'No project') AS project,
  COALESCE(p.hours, 0) AS hours
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
LEFT JOIN projects p ON e.emp_id = p.emp_id
ORDER BY e.name, p.project_name;

Alice Engineering Alpha 120 Alice Engineering Beta 40 Bob Engineering Alpha 80 Carol Marketing Gamma 60 Dave Sales Gamma 100 Eve Engineering Beta 90 Frank Sales Delta 75 Grace Marketing No project 0

Window Functions

Window functions perform calculations across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY does. They are invaluable for ranking, running totals, and comparisons within groups. The OVER clause defines the window — which rows to include and in what order.

ROW_NUMBER: ranking within partitions

Rank employees by salary within each department. ROW_NUMBER() assigns a sequential integer to each row within its partition, ordered by the specified column. This is perfect for "top-N per group" queries.

SELECT
  name,
  dept_name,
  salary,
  ROW_NUMBER() OVER (
    PARTITION BY d.dept_id
    ORDER BY salary DESC
  ) AS dept_rank
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, dept_rank;

Alice Engineering 120000 1 Eve Engineering 105000 2 Bob Engineering 95000 3 Carol Marketing 85000 1 Grace Marketing 78000 2 Dave Sales 110000 1 Frank Sales 90000 2

Running SUM: cumulative totals

A running sum computes a cumulative total as you move through ordered rows. By specifying ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, each row's value includes all previous rows plus itself. This is useful for tracking cumulative budget spend, progressive revenue, or running balances.

SELECT
  name,
  salary,
  SUM(salary) OVER (
    ORDER BY hire_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_payroll
FROM employees e
WHERE e.dept_id IS NOT NULL
ORDER BY hire_date;

Dave 110000 110000 Carol 85000 195000 Alice 120000 315000 Bob 95000 410000 Frank 90000 500000 Eve 105000 605000 Grace 78000 683000

DENSE_RANK: handling ties

Unlike ROW_NUMBER, DENSE_RANK assigns the same rank to tied values and does not skip subsequent ranks. This is useful when multiple items may share the same score or value and you want ranks like 1, 1, 2 instead of 1, 2, 3.

SELECT
  project_name,
  SUM(hours) AS total_hours,
  DENSE_RANK() OVER (ORDER BY SUM(hours) DESC) AS effort_rank
FROM projects
GROUP BY project_name
ORDER BY effort_rank;

Alpha 200 1 Gamma 160 2 Beta 130 3 Delta 75 4

Common Table Expressions (CTEs)

CTEs let you define named temporary result sets at the top of a query using the WITH clause. They make complex queries more readable by breaking them into logical steps, each with a descriptive name. Think of them as inline views that exist only for the duration of the query. CTEs can reference each other, enabling a pipeline-style composition.

WITH dept_costs AS (
  SELECT
    d.dept_name,
    d.budget,
    SUM(e.salary) AS total_salary
  FROM departments d
  INNER JOIN employees e ON d.dept_id = e.dept_id
  GROUP BY d.dept_name, d.budget
),
utilization AS (
  SELECT
    dept_name,
    budget,
    total_salary,
    ROUND(total_salary / budget * 100, 1) AS pct_used
  FROM dept_costs
)
SELECT * FROM utilization ORDER BY pct_used DESC;

const budget = ctx.executeSync(`
  WITH dept_costs AS (
    SELECT d.dept_name, d.budget, SUM(e.salary) AS total_salary
    FROM departments d
    INNER JOIN employees e ON d.dept_id = e.dept_id
    GROUP BY d.dept_name, d.budget
  ),
  utilization AS (
    SELECT dept_name, budget, total_salary,
           ROUND(total_salary / budget * 100, 1) AS pct_used
    FROM dept_costs
  )
  SELECT * FROM utilization ORDER BY pct_used DESC
`);

const deptNames = budget.getColumn('dept_name').toArray();
const pctUsed = budget.getColumn('pct_used').toArray();

deptNames.forEach((d, i) => {
  console.log(`  ${d}: ${pctUsed[i]}% of budget used`);
});

Marketing: 81.5% of budget used Sales: 66.7% of budget used Engineering: 64.0% of budget used

Subqueries

Subqueries embed one query inside another, either in the WHERE clause (for filtering) or the FROM clause (as derived tables). They are useful when you need to compute an intermediate result and immediately use it. The IN operator with a subquery is a common pattern for filtering rows against a dynamically computed set of values.

IN subquery: employees on a specific project

SELECT name, salary
FROM employees
WHERE emp_id IN (
  SELECT emp_id FROM projects WHERE project_name = 'Alpha'
)
ORDER BY salary DESC;

Alice 120000 Bob 95000

Derived table: average hours per department

SELECT d.dept_name, ph.avg_hours
FROM (
  SELECT e.dept_id, AVG(p.hours) AS avg_hours
  FROM projects p
  INNER JOIN employees e ON p.emp_id = e.emp_id
  GROUP BY e.dept_id
) ph
INNER JOIN departments d ON ph.dept_id = d.dept_id
ORDER BY ph.avg_hours DESC;

Sales 87.5 Marketing 60.0 Engineering 82.5

Set Operations

UNION ALL combines the results of two or more SELECT statements into a single result set. Unlike UNION (which removes duplicates), UNION ALL preserves all rows, making it faster for cases where duplicates are acceptable or impossible. This is useful for combining data from different sources or creating summary reports with multiple levels of detail.

SELECT 'High earner' AS label, name, salary
FROM employees WHERE salary >= 100000

UNION ALL

SELECT 'Standard' AS label, name, salary
FROM employees WHERE salary < 100000 AND dept_id IS NOT NULL

ORDER BY salary DESC;

High earner Alice 120000 High earner Dave 110000 High earner Eve 105000 Standard Bob 95000 Standard Frank 90000 Standard Carol 85000 Standard Grace 78000

Putting It All Together

Here is a query that combines everything: a CTE computes project effort per employee, a join brings in department info, and a window function ranks each employee's total project hours within their department. This kind of layered query is the bread and butter of real-world analytics.

WITH effort AS (
  SELECT
    emp_id,
    SUM(hours) AS total_hours,
    COUNT(*)   AS num_projects
  FROM projects
  GROUP BY emp_id
)
SELECT
  e.name,
  d.dept_name,
  ef.num_projects,
  ef.total_hours,
  DENSE_RANK() OVER (
    PARTITION BY d.dept_id
    ORDER BY ef.total_hours DESC
  ) AS dept_effort_rank
FROM effort ef
INNER JOIN employees e ON ef.emp_id = e.emp_id
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, dept_effort_rank;

const combined = ctx.executeSync(`
  WITH effort AS (
    SELECT emp_id, SUM(hours) AS total_hours, COUNT(*) AS num_projects
    FROM projects
    GROUP BY emp_id
  )
  SELECT
    e.name, d.dept_name, ef.num_projects, ef.total_hours,
    DENSE_RANK() OVER (
      PARTITION BY d.dept_id ORDER BY ef.total_hours DESC
    ) AS dept_effort_rank
  FROM effort ef
  INNER JOIN employees e ON ef.emp_id = e.emp_id
  INNER JOIN departments d ON e.dept_id = d.dept_id
  ORDER BY d.dept_name, dept_effort_rank
`);

const cNames = combined.getColumn('name').toArray();
const cDepts = combined.getColumn('dept_name').toArray();
const cHours = combined.getColumn('total_hours').toArray();
const cRanks = combined.getColumn('dept_effort_rank').toArray();

cNames.forEach((n, i) => {
  console.log(`  ${cDepts[i]} #${cRanks[i]}: ${n} (${cHours[i]}h)`);
});

Engineering #1: Alice (160h) Engineering #2: Eve (90h) Engineering #3: Bob (80h) Marketing #1: Carol (60h) Sales #1: Dave (100h) Sales #2: Frank (75h)

← From CSV to Insights Graph Queries (SQL/PGQ) →