Vector Search
Teide supports vector similarity search directly in SQL, enabling nearest-neighbor queries over embedding columns. Vectors are stored as flat F32 arrays in a columnar layout for cache-friendly computation, and an optional HNSW index accelerates K-nearest-neighbor lookups.
Embedding Columns
Embedding columns store dense F32 vectors in a flat N×D layout, where N is the number of rows and D is the dimensionality. You can create embedding columns via SQL or the programmatic API.
Creating a Table with Embeddings via SQL
import { Context } from 'teide-js';
const ctx = new Context();
ctx.executeSync(`
CREATE TABLE documents (
id INTEGER,
title VARCHAR,
embedding REAL[384]
)
`);
// Insert rows with vector literals
ctx.executeSync(`
INSERT INTO documents VALUES
(1, 'Introduction to ML', ARRAY[0.12, 0.45, 0.78, ...]),
(2, 'Deep Learning Guide', ARRAY[0.34, 0.67, 0.91, ...]),
(3, 'Data Engineering', ARRAY[0.56, 0.23, 0.44, ...])
`);
COSINE_SIMILARITY
Computes the cosine similarity between an embedding column and a query vector. The formula is dot(a, b) / (||a|| * ||b||). Returns F64 values in the range [-1.0, 1.0], where 1.0 means identical direction, 0.0 means orthogonal, and -1.0 means opposite direction.
Syntax
COSINE_SIMILARITY(column, ARRAY[v1, v2, ...]) -> F64
Example
const query = [0.15, 0.50, 0.80, /* ... 384 dimensions ... */];
const arrayLiteral = `ARRAY[${query.join(', ')}]`;
const similar = ctx.executeSync(`
SELECT title, COSINE_SIMILARITY(embedding, ${arrayLiteral}) AS similarity
FROM documents
ORDER BY similarity DESC
LIMIT 5
`);
// title | similarity
// ---------------------|----------
// Introduction to ML | 0.9823
// Deep Learning Guide | 0.8714
// Data Engineering | 0.6201
EUCLIDEAN_DISTANCE
Computes the Euclidean (L2) distance between an embedding column and a query vector. The formula is sqrt(sum((a_i - b_i)^2)). Returns F64 values where 0.0 means identical vectors and larger values mean greater distance.
Syntax
EUCLIDEAN_DISTANCE(column, ARRAY[v1, v2, ...]) -> F64
Example
const nearest = ctx.executeSync(`
SELECT title, EUCLIDEAN_DISTANCE(embedding, ARRAY[0.15, 0.50, 0.80]) AS distance
FROM documents
ORDER BY distance ASC
LIMIT 5
`);
// title | distance
// ---------------------|----------
// Introduction to ML | 0.0871
// Deep Learning Guide | 0.3452
// Data Engineering | 1.2103
KNN Fast-Path Optimization
Teide automatically detects the K-nearest-neighbor pattern and optimizes execution. When you write a query that orders by similarity or distance and applies a LIMIT, the engine uses a partial sort (top-K selection) instead of a full sort, reducing time complexity from O(N log N) to O(N log K).
The following patterns are automatically optimized:
-- KNN by cosine similarity (descending = most similar first)
SELECT * FROM documents
ORDER BY COSINE_SIMILARITY(embedding, ARRAY[...]) DESC
LIMIT k;
-- KNN by Euclidean distance (ascending = nearest first)
SELECT * FROM documents
ORDER BY EUCLIDEAN_DISTANCE(embedding, ARRAY[...]) ASC
LIMIT k;
When an HNSW index exists on the embedding column, the fast-path is further accelerated to approximate O(log N) per query.
HNSW Index
For large datasets, you can build a Hierarchical Navigable Small World (HNSW) index on an embedding column. The index provides approximate nearest-neighbor search with tunable accuracy and speed trade-offs.
CREATE VECTOR INDEX
ctx.executeSync(`
CREATE VECTOR INDEX idx_doc_embedding
ON documents (embedding)
USING HNSW (M = 16, ef_construction = 200)
`);
Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
M |
16 | 4–64 | Maximum number of connections per layer. Higher values improve recall at the cost of memory and build time. |
ef_construction |
200 | 16–1000 | Search width during index construction. Higher values produce a better-quality index but take longer to build. |
ef_search |
64 | 1–1000 | Search width during queries. Higher values improve recall at the cost of query latency. Set at query time. |
Setting ef_search at Query Time
// Increase search accuracy for a specific query
ctx.executeSync(`SET ef_search = 200`);
const results = ctx.executeSync(`
SELECT title, COSINE_SIMILARITY(embedding, ARRAY[0.15, 0.50, 0.80]) AS sim
FROM documents
ORDER BY sim DESC
LIMIT 10
`);
// Reset to default
ctx.executeSync(`SET ef_search = 64`);
Dropping an Index
ctx.executeSync(`DROP INDEX idx_doc_embedding`);
DML and Vector Indexes
HNSW indexes are automatically invalidated when the underlying data changes. After an INSERT or DELETE operation on a table with a vector index, the index is marked stale and will be rebuilt on the next query that uses it.
// Index is valid
const r1 = ctx.executeSync(`
SELECT title FROM documents
ORDER BY COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) DESC
LIMIT 3
`);
// Insert invalidates the index
ctx.executeSync(`
INSERT INTO documents VALUES
(4, 'New Document', ARRAY[0.99, 0.01, 0.50])
`);
// Next query triggers automatic index rebuild
const r2 = ctx.executeSync(`
SELECT title FROM documents
ORDER BY COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) DESC
LIMIT 3
`);
For bulk loading scenarios, it is more efficient to insert all data first and then create the index:
// Load data first
ctx.executeSync(`INSERT INTO documents VALUES ...`);
ctx.executeSync(`INSERT INTO documents VALUES ...`);
// Build index once after all inserts
ctx.executeSync(`
CREATE VECTOR INDEX idx_emb ON documents (embedding)
USING HNSW (M = 16, ef_construction = 200)
`);