DEVELOPER

Architecture

Teide JS is a three-layer system: a TypeScript API that developers interact with, a C++17 NAPI addon that bridges JavaScript and native code, and a vendored C17 columnar engine that performs all computation.

Three-Layer Overview

┌─────────────────────────────────────────────┐
│  TypeScript API  (lib/)                     │
│  Context, Table, Query, Expr, Series        │
├─────────────────────────────────────────────┤
│  C++17 NAPI Addon  (src/)                   │
│  NativeContext, NativeTable, NativeSeries    │
│  TeideThread, SPSC queue, ExprNode compiler  │
├─────────────────────────────────────────────┤
│  C17 Core Engine  (vendor/teide/)           │
│  Columnar storage, DAG executor, optimizer   │
│  Buddy allocator, symbol table, graph ops    │
└─────────────────────────────────────────────┘

Project Structure

teide-js/
├── lib/                    # TypeScript API layer
│   ├── context.ts          # Entry point; loads .node addon
│   ├── query.ts            # Lazy query builder with operation stack
│   ├── expr.ts             # Expression tree (col refs, literals, ops, aggs)
│   ├── table.ts            # Table + GroupBy wrappers
│   └── series.ts           # Column accessor with dtype-aware TypedArrays
├── src/                    # C++17 NAPI addon layer
│   ├── addon.cpp           # Module init, exports collectSync/collect
│   ├── context.cpp         # NativeContext: CSV I/O, SQL dispatch
│   ├── query.cpp           # Expression serialization, plan compilation
│   ├── table.cpp           # NativeTable: column access, retain/release
│   ├── series.cpp          # NativeSeries: zero-copy TypedArray creation
│   ├── teide_thread.h      # Background thread + SPSC work queue
│   └── compat.h            # C-atomic shim for C++/C17 interop
├── vendor/teide/           # Vendored C17 core engine
│   └── include/teide/td.h  # Public API + type/opcode definitions
├── test/                   # Vitest test suite
│   ├── smoke.test.ts
│   ├── table.test.ts
│   ├── expr.test.ts
│   └── fixtures/           # CSV test data
├── CMakeLists.txt          # Native build configuration
├── binding.gyp             # node-gyp binding
└── tsconfig.json           # TypeScript configuration

SQL Pipeline

SQL statements go through a multi-stage pipeline before results reach JavaScript:

SQL string
  │
  ▼
┌──────────────────┐
│ PGQ Pre-parse    │  Extracts GRAPH_TABLE / MATCH clauses
└────────┬─────────┘  and rewrites them into internal form
         │
         ▼
┌──────────────────┐
│ Parse            │  node-sql-parser produces an AST
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ Plan             │  AST → PlanStep vector (filter, project,
└────────┬─────────┘  group, sort, limit, join, etc.)
         │
         ▼
┌──────────────────┐
│ Execute          │  PlanSteps run against the Teide engine
└────────┬─────────┘  on the dedicated Teide thread
         │
         ▼
  Table / null

For the fluent query API (table.filter().sort().collectSync()), the pipeline skips the SQL parsing stage. Instead, the TypeScript Expr objects are serialized directly to C++ ExprNode trees, and the operation stack becomes the PlanStep vector.

Threading Model

A key design constraint: the V8 thread must never call Teide C APIs directly. All native operations run on a dedicated Teide thread that owns the C heap. Communication between the two threads uses a lock-free SPSC (single-producer, single-consumer) queue.

┌──────────────┐     SPSC Queue     ┌──────────────┐
│  V8 Thread   │ ──── work item ──▶ │ Teide Thread │
│  (main)      │                    │ (background) │
│              │ ◀── result ─────── │              │
└──────────────┘   cond_var / tsfn  └──────────────┘

Synchronous Path: dispatch_sync()

The V8 thread posts a work item to the SPSC queue, then blocks on a condition variable. The Teide thread picks up the item, executes the operation, writes the result, and signals the condition variable. The V8 thread wakes up and returns the result to JavaScript.

// This call blocks until the Teide thread completes
const table = ctx.readCsvSync('data.csv');

Asynchronous Path: dispatch_async()

The V8 thread posts a work item and immediately returns a Promise. The Teide thread picks up the item, executes the operation, and uses a napi_threadsafe_function to schedule the Promise resolution back on the V8 thread's event loop.

// This returns immediately; work happens on the Teide thread
const table = await ctx.readCsv('data.csv');

Shutdown

When ctx.destroy() is called (or Symbol.dispose triggers), the V8 thread posts a sentinel work item. The Teide thread recognizes the sentinel, cleans up all native resources (td_pool_destroy, td_sym_destroy, td_heap_destroy), and exits. This ensures deterministic cleanup even if JavaScript garbage collection hasn't run.

Zero-Copy Data Access

When you access a column via table.col('name').data, no data is copied. The NativeSeries C++ class uses napi_create_external_typed_array to expose the C heap memory directly as a JavaScript TypedArray.

C Heap Memory          JavaScript
┌──────────────┐      ┌──────────────────┐
│ float64[1000]│ ───▶ │ Float64Array     │
│ (td_col_t)   │      │ .buffer points   │
│              │      │  to C heap       │
└──────────────┘      └──────────────────┘
      ▲
      │ No copy — same memory

Use-After-Free Safety

A subtle hazard: JavaScript garbage collection may run Series destructor code after the Teide heap has been torn down during shutdown. The heap_alive_ atomic flag prevents this:

When the context is alive, heap_alive_ is true. Series destructors call td_release() normally.
During shutdown, heap_alive_ is set to false before td_heap_destroy() runs.
If GC triggers a Series destructor after shutdown, it checks heap_alive_ and skips the td_release() call, avoiding a use-after-free crash.

Operation Graph

The Teide engine represents computations as a directed acyclic graph (DAG) of operations. Each node in the DAG produces a column of data. The graph structure enables:

Common subexpression elimination: If two expressions reference the same column with the same transform, the computation runs once.
Operator fusion: Element-wise operations (add, multiply, compare) on adjacent nodes are fused into a single pass over the data, reducing memory bandwidth.
Lazy evaluation: The graph is only executed when results are materialized (on collectSync() or collect()).

Example: col('price').mul(lit(1.1)).gt(lit(100))

           gt
          / \
        mul  lit(100)
       / \
  col('price')  lit(1.1)

Fused into a single vectorized pass:
  for each row: result[i] = (price[i] * 1.1) > 100

Memory Model

The C17 core engine uses a custom memory allocator designed for columnar workloads:

Component	Purpose
Buddy Allocator	Manages large power-of-2 block allocations for column data. Minimizes external fragmentation.
Slab Cache	Fast fixed-size allocations for internal metadata structures (graph nodes, hash entries).
Thread-Local Heaps	Each thread gets its own heap for allocation-heavy paths, avoiding contention on a global lock.
Reference Counting	`td_retain()` / `td_release()` manage object lifetimes. When the refcount hits zero, memory is returned to the allocator.

The TypeScript layer never allocates or frees native memory directly. The NAPI addon calls td_retain() when wrapping C objects and td_release() when JavaScript garbage-collects the wrapper.

Build Process

Building teide-js involves two compilation stages:

┌─────────────────────────────────────────────────┐
│ 1. CMake compiles vendor/teide/ (C17)           │
│    → static library libteide.a                  │
├─────────────────────────────────────────────────┤
│ 2. node-gyp links src/*.cpp (C++17) + libteide  │
│    → build/Release/teidedb_addon.node           │
├─────────────────────────────────────────────────┤
│ 3. tsc compiles lib/*.ts                        │
│    → dist/*.js + dist/*.d.ts                    │
└─────────────────────────────────────────────────┘

C++ Header Inclusion Order

The src/compat.h header provides a C-atomic shim so that C17 Teide headers (which use _Atomic(T)) compile in C++ mode. This shim redefines _Atomic(T) as volatile T with GCC builtins. To avoid conflicts with the C++ <atomic> header:

NAPI headers (<napi.h>) must be included before compat.h.
C++ standard headers (<string>, <vector>, etc.) must be included before compat.h.
compat.h is included last, immediately before any #include <teide/td.h>.

Build Commands

# Full build: native addon (debug) + TypeScript
npm run build

# Native addon only (debug)
npm run build:native

# Native addon with -O3 optimizations
npm run build:native:release

# TypeScript compilation only
npm run build:ts

# Clean build artifacts
npm run clean

Requirements: CMake 3.15 or later, a C17/C++17 capable compiler, and Node.js 18 or later.