DEVELOPER

Architecture

Teide JS is a three-layer system: a TypeScript API that developers interact with, a C++17 NAPI addon that bridges JavaScript and native code, and a vendored C17 columnar engine that performs all computation.

Three-Layer Overview

┌─────────────────────────────────────────────┐
│  TypeScript API  (lib/)                     │
│  Context, Table, Query, Expr, Series        │
├─────────────────────────────────────────────┤
│  C++17 NAPI Addon  (src/)                   │
│  NativeContext, NativeTable, NativeSeries    │
│  TeideThread, SPSC queue, ExprNode compiler  │
├─────────────────────────────────────────────┤
│  C17 Core Engine  (vendor/teide/)           │
│  Columnar storage, DAG executor, optimizer   │
│  Buddy allocator, symbol table, graph ops    │
└─────────────────────────────────────────────┘

Project Structure

teide-js/
├── lib/                    # TypeScript API layer
│   ├── context.ts          # Entry point; loads .node addon
│   ├── query.ts            # Lazy query builder with operation stack
│   ├── expr.ts             # Expression tree (col refs, literals, ops, aggs)
│   ├── table.ts            # Table + GroupBy wrappers
│   └── series.ts           # Column accessor with dtype-aware TypedArrays
├── src/                    # C++17 NAPI addon layer
│   ├── addon.cpp           # Module init, exports collectSync/collect
│   ├── context.cpp         # NativeContext: CSV I/O, SQL dispatch
│   ├── query.cpp           # Expression serialization, plan compilation
│   ├── table.cpp           # NativeTable: column access, retain/release
│   ├── series.cpp          # NativeSeries: zero-copy TypedArray creation
│   ├── teide_thread.h      # Background thread + SPSC work queue
│   └── compat.h            # C-atomic shim for C++/C17 interop
├── vendor/teide/           # Vendored C17 core engine
│   └── include/teide/td.h  # Public API + type/opcode definitions
├── test/                   # Vitest test suite
│   ├── smoke.test.ts
│   ├── table.test.ts
│   ├── expr.test.ts
│   └── fixtures/           # CSV test data
├── CMakeLists.txt          # Native build configuration
├── binding.gyp             # node-gyp binding
└── tsconfig.json           # TypeScript configuration

SQL Pipeline

SQL statements go through a multi-stage pipeline before results reach JavaScript:

SQL string
  │
  ▼
┌──────────────────┐
│ PGQ Pre-parse    │  Extracts GRAPH_TABLE / MATCH clauses
└────────┬─────────┘  and rewrites them into internal form
         │
         ▼
┌──────────────────┐
│ Parse            │  node-sql-parser produces an AST
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ Plan             │  AST → PlanStep vector (filter, project,
└────────┬─────────┘  group, sort, limit, join, etc.)
         │
         ▼
┌──────────────────┐
│ Execute          │  PlanSteps run against the Teide engine
└────────┬─────────┘  on the dedicated Teide thread
         │
         ▼
  Table / null

For the fluent query API (table.filter().sort().collectSync()), the pipeline skips the SQL parsing stage. Instead, the TypeScript Expr objects are serialized directly to C++ ExprNode trees, and the operation stack becomes the PlanStep vector.

Threading Model

A key design constraint: the V8 thread must never call Teide C APIs directly. All native operations run on a dedicated Teide thread that owns the C heap. Communication between the two threads uses a lock-free SPSC (single-producer, single-consumer) queue.

┌──────────────┐     SPSC Queue     ┌──────────────┐
│  V8 Thread   │ ──── work item ──▶ │ Teide Thread │
│  (main)      │                    │ (background) │
│              │ ◀── result ─────── │              │
└──────────────┘   cond_var / tsfn  └──────────────┘

Synchronous Path: dispatch_sync()

The V8 thread posts a work item to the SPSC queue, then blocks on a condition variable. The Teide thread picks up the item, executes the operation, writes the result, and signals the condition variable. The V8 thread wakes up and returns the result to JavaScript.

// This call blocks until the Teide thread completes
const table = ctx.readCsvSync('data.csv');

Asynchronous Path: dispatch_async()

The V8 thread posts a work item and immediately returns a Promise. The Teide thread picks up the item, executes the operation, and uses a napi_threadsafe_function to schedule the Promise resolution back on the V8 thread's event loop.

// This returns immediately; work happens on the Teide thread
const table = await ctx.readCsv('data.csv');

Shutdown

When ctx.destroy() is called (or Symbol.dispose triggers), the V8 thread posts a sentinel work item. The Teide thread recognizes the sentinel, cleans up all native resources (td_pool_destroy, td_sym_destroy, td_heap_destroy), and exits. This ensures deterministic cleanup even if JavaScript garbage collection hasn't run.

Zero-Copy Data Access

When you access a column via table.col('name').data, no data is copied. The NativeSeries C++ class uses napi_create_external_typed_array to expose the C heap memory directly as a JavaScript TypedArray.

C Heap Memory          JavaScript
┌──────────────┐      ┌──────────────────┐
│ float64[1000]│ ───▶ │ Float64Array     │
│ (td_col_t)   │      │ .buffer points   │
│              │      │  to C heap       │
└──────────────┘      └──────────────────┘
      ▲
      │ No copy — same memory

Use-After-Free Safety

A subtle hazard: JavaScript garbage collection may run Series destructor code after the Teide heap has been torn down during shutdown. The heap_alive_ atomic flag prevents this:

Operation Graph

The Teide engine represents computations as a directed acyclic graph (DAG) of operations. Each node in the DAG produces a column of data. The graph structure enables:

Example: col('price').mul(lit(1.1)).gt(lit(100))

           gt
          / \
        mul  lit(100)
       / \
  col('price')  lit(1.1)

Fused into a single vectorized pass:
  for each row: result[i] = (price[i] * 1.1) > 100

Memory Model

The C17 core engine uses a custom memory allocator designed for columnar workloads:

Component Purpose
Buddy Allocator Manages large power-of-2 block allocations for column data. Minimizes external fragmentation.
Slab Cache Fast fixed-size allocations for internal metadata structures (graph nodes, hash entries).
Thread-Local Heaps Each thread gets its own heap for allocation-heavy paths, avoiding contention on a global lock.
Reference Counting td_retain() / td_release() manage object lifetimes. When the refcount hits zero, memory is returned to the allocator.

The TypeScript layer never allocates or frees native memory directly. The NAPI addon calls td_retain() when wrapping C objects and td_release() when JavaScript garbage-collects the wrapper.

Build Process

Building teide-js involves two compilation stages:

┌─────────────────────────────────────────────────┐
│ 1. CMake compiles vendor/teide/ (C17)           │
│    → static library libteide.a                  │
├─────────────────────────────────────────────────┤
│ 2. node-gyp links src/*.cpp (C++17) + libteide  │
│    → build/Release/teidedb_addon.node           │
├─────────────────────────────────────────────────┤
│ 3. tsc compiles lib/*.ts                        │
│    → dist/*.js + dist/*.d.ts                    │
└─────────────────────────────────────────────────┘

C++ Header Inclusion Order

The src/compat.h header provides a C-atomic shim so that C17 Teide headers (which use _Atomic(T)) compile in C++ mode. This shim redefines _Atomic(T) as volatile T with GCC builtins. To avoid conflicts with the C++ <atomic> header:

Build Commands

# Full build: native addon (debug) + TypeScript
npm run build

# Native addon only (debug)
npm run build:native

# Native addon with -O3 optimizations
npm run build:native:release

# TypeScript compilation only
npm run build:ts

# Clean build artifacts
npm run clean

Requirements: CMake 3.15 or later, a C17/C++17 capable compiler, and Node.js 18 or later.