GUIDES

Node.js Integration

Embed TeideDB in your Node.js application. Use the fluent query API, access zero-copy TypedArrays, handle async operations, and manage resources.

TeideDB is designed to be embedded directly in Node.js applications as a native addon. This guide covers the practical details of integrating TeideDB into real applications: the fluent query builder API, zero-copy data access, async vs sync patterns, resource management, Express.js integration, error handling, and performance optimization.

The Fluent Query API

While TeideDB supports raw SQL via executeSync and execute, the fluent API lets you build queries programmatically using chainable method calls. This approach offers better composability (build queries conditionally), type safety (TypeScript catches method name typos), and readability (no string concatenation for dynamic queries). Under the hood, the fluent API constructs the same expression trees and operation stacks that SQL compilation produces.

import { Context, col, lit } from 'teide-js';

const ctx = new Context();

// Load data
ctx.executeSync(`
  CREATE TABLE users AS SELECT * FROM read_csv('users.csv')
`);

const table = ctx.table('users');

// Fluent query: filter, sort, limit
const result = table
  .filter(col('age').gt(25))
  .filter(col('city').eq('Seattle'))
  .sort('name')
  .head(10)
  .collectSync();

console.log('Names:', result.getColumn('name').toArray());
console.log('Ages:', result.getColumn('age').toArray());

The fluent API methods map directly to SQL operations:

Building expressions

The col() and lit() functions create expression nodes that you compose with operators. Expressions are lazy — they describe a computation without executing it. Only when you call collectSync() or collect() does the expression tree get serialized and sent to the Teide thread for execution.

// Comparison operators
col('price').gt(100)           // price > 100
col('price').gte(100)          // price >= 100
col('price').lt(50)            // price < 50
col('price').eq('Electronics') // price = 'Electronics'
col('price').neq(0)            // price != 0

// Arithmetic
col('price').mul(col('quantity'))  // price * quantity
col('salary').add(lit(5000))      // salary + 5000

// Aggregations (used with groupBy)
col('price').sum()      // SUM(price)
col('price').avg()      // AVG(price)
col('price').min()      // MIN(price)
col('price').max()      // MAX(price)
col('id').count()       // COUNT(id)

// Aliasing
col('price').mul(col('quantity')).alias('total')  // AS total

GroupBy with aggregations

const summary = table
  .groupBy('category')
  .agg(
    col('price').sum().alias('total_revenue'),
    col('price').avg().alias('avg_price'),
    col('id').count().alias('num_items')
  )
  .sort('total_revenue', { descending: true })
  .collectSync();

const cats = summary.getColumn('category').toArray();
const revenue = summary.getColumn('total_revenue').toArray();

cats.forEach((cat, i) => {
  console.log(`  ${cat}: $${revenue[i].toFixed(2)}`);
});

Zero-Copy Series Access

When TeideDB returns query results, column data is exposed directly as JavaScript TypedArrays that point into the native C heap — no data is copied. This means accessing a 10-million-row numeric column is instantaneous: the TypedArray is a view over the existing memory. This is one of TeideDB's core performance advantages over approaches that serialize data to JSON or copy it into JS arrays.

Numeric columns

const result = ctx.executeSync('SELECT id, price, quantity FROM sales LIMIT 5');

// Each Series exposes a TypedArray matching the column's dtype
const ids = result.getColumn('id');
const prices = result.getColumn('price');

console.log('dtype:', prices.dtype);        // 'float64'
console.log('length:', prices.length);      // 5

// toArray() returns the zero-copy TypedArray
const priceArr = prices.toArray();          // Float64Array
console.log(priceArr instanceof Float64Array); // true
console.log(priceArr[0]);                   // 999.99

// Type mapping:
//   INT     -> Int32Array
//   BIGINT  -> BigInt64Array
//   FLOAT   -> Float32Array
//   DOUBLE  -> Float64Array

Dictionary (string) columns

String columns use dictionary encoding for memory efficiency. Instead of storing each string value inline, TeideDB stores a dictionary of unique strings and an array of integer indices. The Series object provides access to both the indices and the dictionary, as well as a convenience toArray() that resolves indices to string values.

const strResult = ctx.executeSync('SELECT category FROM sales LIMIT 5');
const catSeries = strResult.getColumn('category');

console.log('dtype:', catSeries.dtype);  // 'dictionary'

// Access the raw components
const indices = catSeries.indices;        // Int32Array (e.g., [0, 1, 0, 2, 1])
const dictionary = catSeries.dictionary;  // ['Electronics', 'Books', 'Clothing']

// Or use toArray() for resolved values
const values = catSeries.toArray();
// ['Electronics', 'Books', 'Electronics', 'Clothing', 'Books']

Null handling

Nullable columns include a null bitmap — a compact bit array where each bit indicates whether the corresponding row is null (0) or valid (1). The nullBitmap property exposes this as a Uint8Array, and isNull(index) provides a convenience check for individual rows.

const nullable = ctx.executeSync(`
  SELECT name, dept_id FROM employees
`);

const deptSeries = nullable.getColumn('dept_id');

// Check individual values
for (let i = 0; i < deptSeries.length; i++) {
  if (deptSeries.isNull(i)) {
    console.log(`  Row ${i}: NULL`);
  } else {
    console.log(`  Row ${i}: ${deptSeries.toArray()[i]}`);
  }
}

// Access the raw bitmap
const bitmap = deptSeries.nullBitmap;  // Uint8Array

Async Operations

TeideDB provides both synchronous and asynchronous versions of key operations. Sync methods (executeSync, collectSync, readCsvSync) block the V8 thread until the Teide thread finishes. Async methods (execute, collect, readCsv) return Promises and keep the event loop running, making them essential for server applications where blocking would stall all request handling.

// Synchronous — blocks until complete
const syncResult = ctx.executeSync('SELECT * FROM sales LIMIT 10');

// Asynchronous — returns a Promise
const asyncResult = await ctx.execute('SELECT * FROM sales LIMIT 10');

// Async CSV loading
await ctx.readCsv('large_dataset.csv', 'big_table');

// Async fluent query
const asyncFluent = await ctx.table('sales')
  .filter(col('price').gt(100))
  .sort('price', { descending: true })
  .head(20)
  .collect();  // Note: collect(), not collectSync()

Note: Use sync methods for scripts, CLI tools, and tests where simplicity matters. Use async methods for HTTP servers, real-time applications, and any context where you must not block the event loop. Both methods dispatch work to the same Teide background thread — the only difference is whether the V8 thread waits or continues.

When to use async vs sync

Scenario Recommended Reason
Express/Fastify request handler execute() / collect() Must not block other requests
CLI data analysis script executeSync() / collectSync() Simpler code, no concurrency needed
Loading large CSV at startup readCsv() (async) May take seconds; don't block startup
Unit tests executeSync() Easier to assert on direct return values

Resource Management

Each Context owns a native C heap and a background thread. You must destroy it when finished to free memory and stop the thread. TeideDB supports three patterns for resource cleanup, from manual to fully automatic.

Manual cleanup

const ctx = new Context();
try {
  // ... use ctx ...
} finally {
  ctx.destroy();
}

Symbol.dispose (TypeScript / Node.js 20+)

// Automatic cleanup when `ctx` goes out of scope
{
  using ctx = new Context();
  ctx.executeSync('SELECT 1');
  // ctx.destroy() called automatically at end of block
}

Process exit handling

const ctx = new Context();

process.on('SIGINT', () => {
  ctx.destroy();
  process.exit(0);
});

process.on('SIGTERM', () => {
  ctx.destroy();
  process.exit(0);
});

Note: Calling destroy() posts a shutdown sentinel to the Teide thread, which then tears down td_pool, td_sym, and td_heap. After destroy(), any attempt to use the context will throw. The heap_alive_ atomic flag prevents use-after-free in Series destructors that may run later during GC.

Express.js Integration

Here is a complete example of embedding TeideDB in an Express.js API server. The context is created once at startup and shared across requests. Each request handler runs an async query and returns JSON results. This pattern works equally well with Fastify, Koa, or any Node.js HTTP framework.

import express from 'express';
import { Context, col } from 'teide-js';

const app = express();
const ctx = new Context();

// Load data at startup
await ctx.readCsv('products.csv', 'products');

// Query endpoint
app.get('/api/products', async (req, res) => {
  try {
    const { category, minPrice, limit = 20 } = req.query;

    let sql = 'SELECT id, name, category, price FROM products WHERE 1=1';
    if (category) sql += ` AND category = '${category}'`;
    if (minPrice) sql += ` AND price >= ${Number(minPrice)}`;
    sql += ` ORDER BY price DESC LIMIT ${Number(limit)}`;

    const result = await ctx.execute(sql);

    // Convert to JSON array
    const columns = result.columnNames();
    const numRows = result.getColumn(columns[0]).length;
    const rows = [];

    for (let i = 0; i < numRows; i++) {
      const row = {};
      for (const c of columns) {
        row[c] = result.getColumn(c).toArray()[i];
      }
      rows.push(row);
    }

    res.json({ data: rows, count: numRows });
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

// Fluent API endpoint
app.get('/api/products/summary', async (req, res) => {
  try {
    const summary = await ctx.table('products')
      .groupBy('category')
      .agg(
        col('price').avg().alias('avg_price'),
        col('id').count().alias('count')
      )
      .sort('avg_price', { descending: true })
      .collect();

    const cats = summary.getColumn('category').toArray();
    const avgPrices = summary.getColumn('avg_price').toArray();
    const counts = summary.getColumn('count').toArray();

    const data = cats.map((cat, i) => ({
      category: cat,
      avgPrice: avgPrices[i],
      count: counts[i],
    }));

    res.json({ data });
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

// Graceful shutdown
process.on('SIGTERM', () => {
  ctx.destroy();
  process.exit(0);
});

app.listen(3000, () => console.log('Listening on :3000'));

Note: In production, parameterize SQL queries to prevent injection. The example above uses string interpolation for clarity, but you should validate and sanitize all user inputs before including them in SQL strings.

Error Handling

TeideDB throws JavaScript errors for SQL syntax errors, missing tables, type mismatches, and other problems. Both sync and async methods can throw, so always wrap calls in try/catch. The error message includes the underlying Teide engine error text, which is usually descriptive enough to diagnose the problem.

// SQL syntax error
try {
  ctx.executeSync('SELCT * FROM sales');  // typo
} catch (err) {
  console.error('SQL error:', err.message);
  // "SQL error: Parse error: expected SELECT, INSERT, CREATE, ..."
}

// Missing table
try {
  ctx.executeSync('SELECT * FROM nonexistent');
} catch (err) {
  console.error('Table error:', err.message);
  // "Table error: table 'nonexistent' does not exist"
}

// Type mismatch
try {
  ctx.executeSync(`SELECT * FROM sales WHERE price = 'not_a_number'`);
} catch (err) {
  console.error('Type error:', err.message);
}

// Async error handling
try {
  await ctx.execute('SELECT * FROM missing_table');
} catch (err) {
  console.error('Async error:', err.message);
}

// Fluent API errors
try {
  await ctx.table('sales')
    .filter(col('nonexistent_column').gt(0))
    .collect();
} catch (err) {
  console.error('Column error:', err.message);
  // "Column error: column 'nonexistent_column' not found in table 'sales'"
}

Performance Tips

TeideDB is designed for analytical performance, but how you use it from Node.js matters. Here are practical tips for getting the best throughput and lowest latency from the bindings.

1. Reuse the Context

Creating a Context allocates a native heap and spawns a background thread. Do this once at application startup, not per-request. Context creation is on the order of milliseconds, but it adds up under load.

// Good: singleton context
const ctx = new Context();
app.get('/query', (req, res) => { /* use ctx */ });

// Bad: context per request
app.get('/query', (req, res) => {
  const ctx = new Context();   // Expensive!
  // ... query ...
  ctx.destroy();
});

2. Batch related queries

Each call to executeSync or execute crosses the thread boundary (V8 → Teide thread → V8). If you need to run multiple setup statements, combine them into fewer calls where possible. Multi-statement SQL is supported.

// Good: batch DDL
ctx.executeSync(`
  CREATE TABLE a (id INT, val DOUBLE);
  CREATE TABLE b (id INT, label TEXT);
`);

// Acceptable but slower: separate calls
ctx.executeSync('CREATE TABLE a (id INT, val DOUBLE)');
ctx.executeSync('CREATE TABLE b (id INT, label TEXT)');

3. Prefer zero-copy access

Avoid converting TypedArrays to regular JavaScript arrays unless you need to. TypedArrays are already iterable and indexable. Converting a 10-million-element Float64Array to a regular array copies 80 MB of data needlessly.

const series = result.getColumn('price');

// Good: use the TypedArray directly
const typedArr = series.toArray();  // Float64Array, zero-copy
let sum = 0;
for (let i = 0; i < typedArr.length; i++) sum += typedArr[i];

// Bad: convert to JS array (copies all data)
const jsArray = Array.from(series.toArray());  // Copies everything

4. Use async for I/O-bound workloads

If your server handles concurrent requests, use async methods so the event loop can process other requests while TeideDB works. The Teide thread runs independently, so async dispatch has near-zero overhead compared to sync.

5. Profile with column statistics

For query optimization, check how much data you are actually scanning. Use LIMIT generously in exploratory queries, and push filters as early as possible to reduce the working set.

// Good: filter then aggregate
ctx.executeSync(`
  SELECT category, AVG(price)
  FROM sales
  WHERE date >= '2024-01-01'
  GROUP BY category
`);

// Slower: aggregate everything, then filter
ctx.executeSync(`
  SELECT category, avg_price
  FROM (SELECT category, AVG(price) AS avg_price FROM sales GROUP BY category)
  WHERE category IN ('Electronics', 'Books')
`);