SQL REFERENCE

Vector Search

Teide supports vector similarity search directly in SQL, enabling nearest-neighbor queries over embedding columns. Vectors are stored as flat F32 arrays in a columnar layout for cache-friendly computation, and an optional HNSW index accelerates K-nearest-neighbor lookups.

Embedding Columns

Embedding columns store dense F32 vectors in a flat N×D layout, where N is the number of rows and D is the dimensionality. You can create embedding columns via SQL or the programmatic API.

Creating a Table with Embeddings via SQL

import { Context } from 'teide-js';

const ctx = new Context();

ctx.executeSync(`
  CREATE TABLE documents (
    id INTEGER,
    title VARCHAR,
    embedding REAL[384]
  )
`);

// Insert rows with vector literals
ctx.executeSync(`
  INSERT INTO documents VALUES
    (1, 'Introduction to ML', ARRAY[0.12, 0.45, 0.78, ...]),
    (2, 'Deep Learning Guide', ARRAY[0.34, 0.67, 0.91, ...]),
    (3, 'Data Engineering',    ARRAY[0.56, 0.23, 0.44, ...])
`);

COSINE_SIMILARITY

Computes the cosine similarity between an embedding column and a query vector. The formula is dot(a, b) / (||a|| * ||b||). Returns F64 values in the range [-1.0, 1.0], where 1.0 means identical direction, 0.0 means orthogonal, and -1.0 means opposite direction.

Syntax

COSINE_SIMILARITY(column, ARRAY[v1, v2, ...]) -> F64

Example

const query = [0.15, 0.50, 0.80, /* ... 384 dimensions ... */];
const arrayLiteral = `ARRAY[${query.join(', ')}]`;

const similar = ctx.executeSync(`
  SELECT title, COSINE_SIMILARITY(embedding, ${arrayLiteral}) AS similarity
  FROM documents
  ORDER BY similarity DESC
  LIMIT 5
`);

// title                | similarity
// ---------------------|----------
// Introduction to ML   | 0.9823
// Deep Learning Guide  | 0.8714
// Data Engineering      | 0.6201

EUCLIDEAN_DISTANCE

Computes the Euclidean (L2) distance between an embedding column and a query vector. The formula is sqrt(sum((a_i - b_i)^2)). Returns F64 values where 0.0 means identical vectors and larger values mean greater distance.

Syntax

EUCLIDEAN_DISTANCE(column, ARRAY[v1, v2, ...]) -> F64

Example

const nearest = ctx.executeSync(`
  SELECT title, EUCLIDEAN_DISTANCE(embedding, ARRAY[0.15, 0.50, 0.80]) AS distance
  FROM documents
  ORDER BY distance ASC
  LIMIT 5
`);

// title                | distance
// ---------------------|----------
// Introduction to ML   | 0.0871
// Deep Learning Guide  | 0.3452
// Data Engineering      | 1.2103

KNN Fast-Path Optimization

Teide automatically detects the K-nearest-neighbor pattern and optimizes execution. When you write a query that orders by similarity or distance and applies a LIMIT, the engine uses a partial sort (top-K selection) instead of a full sort, reducing time complexity from O(N log N) to O(N log K).

The following patterns are automatically optimized:

-- KNN by cosine similarity (descending = most similar first)
SELECT * FROM documents
ORDER BY COSINE_SIMILARITY(embedding, ARRAY[...]) DESC
LIMIT k;

-- KNN by Euclidean distance (ascending = nearest first)
SELECT * FROM documents
ORDER BY EUCLIDEAN_DISTANCE(embedding, ARRAY[...]) ASC
LIMIT k;

When an HNSW index exists on the embedding column, the fast-path is further accelerated to approximate O(log N) per query.

HNSW Index

For large datasets, you can build a Hierarchical Navigable Small World (HNSW) index on an embedding column. The index provides approximate nearest-neighbor search with tunable accuracy and speed trade-offs.

CREATE VECTOR INDEX

ctx.executeSync(`
  CREATE VECTOR INDEX idx_doc_embedding
  ON documents (embedding)
  USING HNSW (M = 16, ef_construction = 200)
`);

Parameters

Parameter Default Range Description
M 16 4–64 Maximum number of connections per layer. Higher values improve recall at the cost of memory and build time.
ef_construction 200 16–1000 Search width during index construction. Higher values produce a better-quality index but take longer to build.
ef_search 64 1–1000 Search width during queries. Higher values improve recall at the cost of query latency. Set at query time.

Setting ef_search at Query Time

// Increase search accuracy for a specific query
ctx.executeSync(`SET ef_search = 200`);

const results = ctx.executeSync(`
  SELECT title, COSINE_SIMILARITY(embedding, ARRAY[0.15, 0.50, 0.80]) AS sim
  FROM documents
  ORDER BY sim DESC
  LIMIT 10
`);

// Reset to default
ctx.executeSync(`SET ef_search = 64`);

Dropping an Index

ctx.executeSync(`DROP INDEX idx_doc_embedding`);

DML and Vector Indexes

HNSW indexes are automatically invalidated when the underlying data changes. After an INSERT or DELETE operation on a table with a vector index, the index is marked stale and will be rebuilt on the next query that uses it.

// Index is valid
const r1 = ctx.executeSync(`
  SELECT title FROM documents
  ORDER BY COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) DESC
  LIMIT 3
`);

// Insert invalidates the index
ctx.executeSync(`
  INSERT INTO documents VALUES
    (4, 'New Document', ARRAY[0.99, 0.01, 0.50])
`);

// Next query triggers automatic index rebuild
const r2 = ctx.executeSync(`
  SELECT title FROM documents
  ORDER BY COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) DESC
  LIMIT 3
`);

For bulk loading scenarios, it is more efficient to insert all data first and then create the index:

// Load data first
ctx.executeSync(`INSERT INTO documents VALUES ...`);
ctx.executeSync(`INSERT INTO documents VALUES ...`);

// Build index once after all inserts
ctx.executeSync(`
  CREATE VECTOR INDEX idx_emb ON documents (embedding)
  USING HNSW (M = 16, ef_construction = 200)
`);