From CSV to Insights
A practical walkthrough: load a CSV file, explore the data, filter rows, aggregate results, and extract insights — all from Node.js.
This guide takes you through a realistic data analysis workflow using TeideDB's Node.js bindings. We will start with a raw CSV file containing sales records, then progressively build up queries to filter, aggregate, rank, and export results. By the end, you will have a complete script that transforms raw data into actionable business insights.
Loading a CSV File
TeideDB can ingest CSV files directly into in-memory columnar tables using the read_csv function. This parses the file, infers column types, and stores the data in a compressed columnar format optimized for analytical queries. The entire operation runs on the dedicated Teide thread, so the V8 event loop stays responsive even for large files.
Assume we have a file called sales.csv with columns: id, date, category, product, price, quantity, and region.
import { Context } from 'teide-js';
const ctx = new Context();
// Load the CSV into a table named "sales"
ctx.executeSync(`
CREATE TABLE sales AS
SELECT * FROM read_csv('sales.csv')
`);
console.log('Table loaded successfully.');
Note: The file path passed to read_csv is resolved relative to the current working directory of the Node.js process, not relative to the script file. Use an absolute path if your script may be invoked from different directories.
Exploring the Data
Before diving into analysis, it is good practice to understand the shape of your data. How many rows does the table contain? What do the first few records look like? TeideDB supports standard SQL for exploration, so you can use familiar queries to get oriented. The executeSync method returns a Table object that you can inspect column by column.
-- Count total rows
SELECT COUNT(*) AS total_rows FROM sales;
-- Preview first 5 rows
SELECT * FROM sales LIMIT 5;
// Row count
const countResult = ctx.executeSync('SELECT COUNT(*) AS total_rows FROM sales');
const totalRows = countResult.getColumn('total_rows').toArray()[0];
console.log(`Total rows: ${totalRows}`);
// Preview
const preview = ctx.executeSync('SELECT * FROM sales LIMIT 5');
for (const name of preview.columnNames()) {
console.log(` ${name}: ${preview.getColumn(name).toArray()}`);
}
Filtering Rows
With a feel for the data, let us narrow down to a specific segment. Suppose we want to focus on premium electronics — items in the "Electronics" category priced above $100. SQL's WHERE clause handles this naturally. TeideDB pushes predicates down to the columnar scan, so only matching rows are materialized, keeping memory usage low even on large datasets.
SELECT product, price, quantity, price * quantity AS line_total
FROM sales
WHERE category = 'Electronics'
AND price > 100
ORDER BY line_total DESC
LIMIT 10;
const electronics = ctx.executeSync(`
SELECT product, price, quantity, price * quantity AS line_total
FROM sales
WHERE category = 'Electronics'
AND price > 100
ORDER BY line_total DESC
LIMIT 10
`);
const products = electronics.getColumn('product').toArray();
const totals = electronics.getColumn('line_total').toArray();
products.forEach((p, i) => {
console.log(` ${p}: $${totals[i].toFixed(2)}`);
});
Aggregating Results
Individual rows tell a story, but aggregate statistics reveal the bigger picture. Let us compute summary metrics for each product category: total revenue, average price, and number of transactions. The GROUP BY clause partitions rows by category, and aggregate functions (SUM, AVG, COUNT) collapse each partition into a single summary row. TeideDB evaluates aggregates in a single pass over the columnar data using vectorized operations.
SELECT
category,
COUNT(*) AS num_transactions,
SUM(price * quantity) AS total_revenue,
AVG(price) AS avg_price
FROM sales
GROUP BY category
ORDER BY total_revenue DESC;
const summary = ctx.executeSync(`
SELECT
category,
COUNT(*) AS num_transactions,
SUM(price * quantity) AS total_revenue,
AVG(price) AS avg_price
FROM sales
GROUP BY category
ORDER BY total_revenue DESC
`);
const cats = summary.getColumn('category').toArray();
const revenue = summary.getColumn('total_revenue').toArray();
const avgPrice = summary.getColumn('avg_price').toArray();
const txns = summary.getColumn('num_transactions').toArray();
cats.forEach((cat, i) => {
console.log(
` ${cat}: $${revenue[i].toFixed(2)} revenue, ` +
`${txns[i]} transactions, $${avgPrice[i].toFixed(2)} avg price`
);
});
Finding Top Products
To surface the most impactful products, we can combine aggregation with sorting and limiting. This is a common pattern for "top-N" reports: group, compute a metric, sort descending, and take the first few rows. The query below finds the five products generating the most total revenue across all categories.
SELECT
product,
category,
SUM(price * quantity) AS total_revenue,
SUM(quantity) AS units_sold
FROM sales
GROUP BY product, category
ORDER BY total_revenue DESC
LIMIT 5;
const topProducts = ctx.executeSync(`
SELECT
product,
category,
SUM(price * quantity) AS total_revenue,
SUM(quantity) AS units_sold
FROM sales
GROUP BY product, category
ORDER BY total_revenue DESC
LIMIT 5
`);
console.log('Top 5 products by revenue:');
const names = topProducts.getColumn('product').toArray();
const rev = topProducts.getColumn('total_revenue').toArray();
const units = topProducts.getColumn('units_sold').toArray();
names.forEach((name, i) => {
console.log(` ${i + 1}. ${name} — $${rev[i].toFixed(2)} (${units[i]} units)`);
});
Combining Queries for Deeper Analysis
Real analysis often requires combining multiple operations. Subqueries let you use the result of one query as input to another. In this example, we find each region's best-selling category by first computing per-region, per-category revenue, then filtering to only the top category in each region using a correlated subquery. This kind of nested analysis is where SQL really shines compared to procedural code.
SELECT r.region, r.category, r.total_revenue
FROM (
SELECT
region,
category,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY region, category
) r
WHERE r.total_revenue = (
SELECT MAX(sub.total_revenue)
FROM (
SELECT region, category, SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY region, category
) sub
WHERE sub.region = r.region
)
ORDER BY r.total_revenue DESC;
const regionalBest = ctx.executeSync(`
SELECT r.region, r.category, r.total_revenue
FROM (
SELECT region, category, SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY region, category
) r
WHERE r.total_revenue = (
SELECT MAX(sub.total_revenue)
FROM (
SELECT region, category, SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY region, category
) sub
WHERE sub.region = r.region
)
ORDER BY r.total_revenue DESC
`);
console.log('Top category per region:');
const regions = regionalBest.getColumn('region').toArray();
const bestCats = regionalBest.getColumn('category').toArray();
const bestRev = regionalBest.getColumn('total_revenue').toArray();
regions.forEach((region, i) => {
console.log(` ${region}: ${bestCats[i]} ($${bestRev[i].toFixed(2)})`);
});
Exporting Results
Once you have computed your insights, you will typically want to export the data for use elsewhere — as JSON for an API response, as CSV for a spreadsheet, or as typed arrays for further numerical processing. TeideDB's zero-copy Series accessors give you direct access to the underlying columnar buffers as JavaScript TypedArrays, making export both fast and memory-efficient.
// Export as JSON array
const result = ctx.executeSync(`
SELECT category, SUM(price * quantity) AS revenue
FROM sales
GROUP BY category
ORDER BY revenue DESC
`);
const columns = result.columnNames();
const numRows = result.getColumn(columns[0]).length;
const rows = [];
for (let i = 0; i < numRows; i++) {
const row = {};
for (const col of columns) {
row[col] = result.getColumn(col).toArray()[i];
}
rows.push(row);
}
console.log(JSON.stringify(rows, null, 2));
// Clean up when done
ctx.destroy();
Note: Always call ctx.destroy() when you are finished to release the native heap and shut down the Teide background thread. In TypeScript, you can use using ctx = new Context() for automatic cleanup via Symbol.dispose.