| Crates.io | ruvector-node |
| lib.rs | ruvector-node |
| version | 0.1.29 |
| created_at | 2025-11-20 21:25:58.889724+00 |
| updated_at | 2025-12-29 19:17:09.964255+00 |
| description | Node.js bindings for Ruvector via NAPI-RS |
| homepage | |
| repository | https://github.com/ruvnet/ruvector |
| max_upload_size | |
| id | 1942630 |
| size | 161,512 |
Native Rust performance for Node.js vector databases via NAPI-RS
Bring the power of Ruvector's blazing-fast vector search to your Node.js and TypeScript applications. Built with NAPI-RS for zero-overhead native bindings, async/await support, and complete type safety.
Part of the Ruvector ecosystem - next-generation vector database built in Rust.
In the age of AI, Node.js applications need fast, efficient vector search for RAG systems, semantic search, and recommendation engines. But existing JavaScript solutions are slow, memory-intensive, or lack critical features.
Ruvector Node.js eliminates these limitations.
Operation Ruvector Pure JS Speedup
────────────────────────────────────────────────────────
Insert 1M vectors 2.1s 45s 21x
Search (k=10) 0.4ms 50ms 125x
Memory (1M vectors) 800MB 3GB 3.75x
HNSW Build 1.8s N/A Native only
Product Quantization Yes No 32x compression
SIMD Acceleration Yes No 4-16x faster
Metric Value Details
──────────────────────────────────────────────────────────
Query Latency (p50) <0.5ms HNSW + SIMD optimizations
Throughput (QPS) 50K+ Single-threaded Node.js
Memory (1M vectors) ~800MB With scalar quantization
Recall @ k=10 95%+ HNSW configuration
Browser Support Via WASM Use ruvector-wasm package
Offline Capable ✅ Embedded database
npm install ruvector
Requirements:
Optional: Verify installation
node -e "const {version} = require('ruvector'); console.log('Ruvector v' + version())"
const { VectorDB } = require('ruvector');
async function main() {
// Create database with 384 dimensions (e.g., for sentence-transformers)
const db = VectorDB.withDimensions(384);
// Insert vectors with metadata
const id1 = await db.insert({
vector: new Float32Array(384).fill(0.1),
metadata: { text: 'Hello world', category: 'greeting' }
});
const id2 = await db.insert({
vector: new Float32Array(384).fill(0.2),
metadata: { text: 'Goodbye world', category: 'farewell' }
});
console.log(`Inserted: ${id1}, ${id2}`);
// Search for similar vectors
const results = await db.search({
vector: new Float32Array(384).fill(0.15),
k: 10,
filter: { category: 'greeting' }
});
results.forEach(result => {
console.log(`ID: ${result.id}, Score: ${result.score}`);
console.log(`Metadata:`, result.metadata);
});
// Get database stats
console.log(`Total vectors: ${await db.len()}`);
}
main().catch(console.error);
import { VectorDB, JsDbOptions, JsSearchQuery } from 'ruvector';
interface DocumentMetadata {
text: string;
category: string;
timestamp: number;
}
async function semanticSearch() {
// Advanced configuration
const options: JsDbOptions = {
dimensions: 768,
distanceMetric: 'Cosine',
storagePath: './my-vectors.db',
hnswConfig: {
m: 32, // Connections per layer
efConstruction: 200, // Build quality
efSearch: 100 // Search quality
},
quantization: {
type: 'product',
subspaces: 16,
k: 256
}
};
const db = new VectorDB(options);
// Batch insert for better performance
const embeddings = await getEmbeddings(['doc1', 'doc2', 'doc3']);
const ids = await db.insertBatch(
embeddings.map((vec, i) => ({
vector: new Float32Array(vec),
metadata: {
text: `Document ${i}`,
category: 'article',
timestamp: Date.now()
}
}))
);
console.log(`Inserted ${ids.length} vectors`);
// Semantic search with filters
const query: JsSearchQuery = {
vector: new Float32Array(await getEmbedding('search query')),
k: 10,
filter: { category: 'article' },
efSearch: 150 // Higher = more accurate but slower
};
const results = await db.search(query);
return results.map(r => ({
id: r.id,
similarity: 1 - r.score, // Convert distance to similarity
metadata: r.metadata as DocumentMetadata
}));
}
// Helper function (replace with your embedding model)
async function getEmbedding(text: string): Promise<number[]> {
// Use OpenAI, Cohere, or local model like sentence-transformers
return new Array(768).fill(0);
}
async function getEmbeddings(texts: string[]): Promise<number[][]> {
return Promise.all(texts.map(getEmbedding));
}
// Option 1: Full configuration
const db = new VectorDB({
dimensions: 384, // Required: Vector dimensions
distanceMetric?: 'Euclidean' | 'Cosine' | 'DotProduct' | 'Manhattan',
storagePath?: string, // Default: './ruvector.db'
hnswConfig?: {
m?: number, // Default: 32 (16-64 recommended)
efConstruction?: number, // Default: 200 (100-500)
efSearch?: number, // Default: 100 (50-500)
maxElements?: number // Default: 10,000,000
},
quantization?: {
type: 'none' | 'scalar' | 'product' | 'binary',
subspaces?: number, // For product quantization (Default: 16)
k?: number // Codebook size (Default: 256)
}
});
// Option 2: Simple factory method (uses defaults)
const db = VectorDB.withDimensions(384);
Configuration Guide:
Cosine: Best for normalized vectors (text embeddings, most ML models) - DefaultEuclidean: Best for absolute distances (images, spatial data)DotProduct: Best for positive vectors with magnitude infoManhattan: Best for sparse vectors (L1 norm)insert(entry): Promise<string>Insert a single vector and return its ID.
const id = await db.insert({
id?: string, // Optional (auto-generated UUID if not provided)
vector: Float32Array, // Required: Vector data
metadata?: Record<string, any> // Optional: JSON object
});
Example:
const id = await db.insert({
vector: new Float32Array([0.1, 0.2, 0.3, ...]),
metadata: {
text: 'example document',
category: 'research',
timestamp: Date.now()
}
});
console.log(`Inserted with ID: ${id}`);
insertBatch(entries): Promise<string[]>Insert multiple vectors efficiently in a batch (10-50x faster than sequential inserts).
const ids = await db.insertBatch([
{ vector: new Float32Array([...]) },
{ vector: new Float32Array([...]), metadata: { text: 'example' } }
]);
Example:
// Bad: Sequential inserts (slow)
for (const vector of vectors) {
await db.insert({ vector }); // Don't do this!
}
// Good: Batch insert (10-50x faster)
await db.insertBatch(
vectors.map(v => ({ vector: new Float32Array(v) }))
);
search(query): Promise<SearchResult[]>Search for similar vectors using HNSW approximate nearest neighbor search.
const results = await db.search({
vector: Float32Array, // Required: Query vector
k: number, // Required: Number of results
filter?: Record<string, any>, // Optional: Metadata filters
efSearch?: number // Optional: HNSW search parameter (higher = more accurate)
});
// Result format:
interface SearchResult {
id: string; // Vector ID
score: number; // Distance (lower is better for most metrics)
vector?: number[]; // Original vector (optional, for debugging)
metadata?: any; // Metadata object
}
Example:
const results = await db.search({
vector: queryEmbedding,
k: 10,
filter: { category: 'research', year: 2024 },
efSearch: 150 // Higher = better recall, slower search
});
results.forEach(result => {
const similarity = 1 - result.score; // Convert distance to similarity
console.log(`${result.metadata.text}: ${similarity.toFixed(3)}`);
});
get(id): Promise<VectorEntry | null>Retrieve a vector by ID.
const entry = await db.get('vector-id');
if (entry) {
console.log(entry.vector, entry.metadata);
}
delete(id): Promise<boolean>Delete a vector by ID. Returns true if deleted, false if not found.
const deleted = await db.delete('vector-id');
console.log(deleted ? 'Deleted' : 'Not found');
len(): Promise<number>Get the total number of vectors in the database.
const count = await db.len();
console.log(`Database contains ${count} vectors`);
isEmpty(): Promise<boolean>Check if the database is empty.
if (await db.isEmpty()) {
console.log('No vectors yet');
}
version(): stringGet the Ruvector library version.
import { version } from 'ruvector';
console.log(`Ruvector v${version()}`);
Build production-ready RAG systems with fast vector retrieval for LLMs.
const { VectorDB } = require('ruvector');
const { OpenAI } = require('openai');
class RAGSystem {
constructor() {
this.db = VectorDB.withDimensions(1536); // OpenAI embedding size
this.openai = new OpenAI();
}
async indexDocument(text, metadata) {
// Split into chunks (use better chunking in production)
const chunks = this.chunkText(text, 512);
// Get embeddings from OpenAI
const embeddings = await this.openai.embeddings.create({
model: 'text-embedding-3-small',
input: chunks
});
// Insert into vector DB
const ids = await this.db.insertBatch(
embeddings.data.map((emb, i) => ({
vector: new Float32Array(emb.embedding),
metadata: { ...metadata, chunk: i, text: chunks[i] }
}))
);
return ids;
}
async query(question, k = 5) {
// Embed the question
const response = await this.openai.embeddings.create({
model: 'text-embedding-3-small',
input: [question]
});
// Search for relevant chunks
const results = await this.db.search({
vector: new Float32Array(response.data[0].embedding),
k
});
// Extract context
const context = results
.map(r => r.metadata.text)
.join('\n\n');
// Generate answer with LLM
const completion = await this.openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: 'Answer based on the context provided.' },
{ role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` }
]
});
return {
answer: completion.choices[0].message.content,
sources: results.map(r => r.metadata)
};
}
chunkText(text, maxLength) {
// Simple word-based chunking
const words = text.split(' ');
const chunks = [];
let current = [];
for (const word of words) {
current.push(word);
if (current.join(' ').length > maxLength) {
chunks.push(current.join(' '));
current = [];
}
}
if (current.length > 0) {
chunks.push(current.join(' '));
}
return chunks;
}
}
// Usage
const rag = new RAGSystem();
await rag.indexDocument('Long document text...', { source: 'doc.pdf' });
const result = await rag.query('What is the main topic?');
console.log(result.answer);
console.log('Sources:', result.sources);
Find similar code patterns across your codebase.
import { VectorDB } from 'ruvector';
import { pipeline } from '@xenova/transformers';
// Use a code-specific embedding model
const embedder = await pipeline(
'feature-extraction',
'Xenova/codebert-base'
);
const db = VectorDB.withDimensions(768);
// Index code snippets
async function indexCodebase(codeFiles: Array<{ path: string, code: string }>) {
for (const file of codeFiles) {
const embedding = await embedder(file.code, {
pooling: 'mean',
normalize: true
});
await db.insert({
vector: new Float32Array(embedding.data),
metadata: {
path: file.path,
code: file.code,
language: file.path.split('.').pop()
}
});
}
}
// Search for similar code
async function findSimilarCode(query: string, k = 10) {
const embedding = await embedder(query, {
pooling: 'mean',
normalize: true
});
const results = await db.search({
vector: new Float32Array(embedding.data),
k
});
return results.map(r => ({
path: r.metadata.path,
code: r.metadata.code,
similarity: 1 - r.score
}));
}
// Example usage
await indexCodebase([
{ path: 'utils.ts', code: 'function parseJSON(str) { ... }' },
{ path: 'api.ts', code: 'async function fetchData(url) { ... }' }
]);
const similar = await findSimilarCode('parse JSON string');
console.log(similar);
Build personalized recommendation engines.
const { VectorDB } = require('ruvector');
class RecommendationEngine {
constructor() {
// User/item embeddings from collaborative filtering or content features
this.db = VectorDB.withDimensions(128);
}
async addItem(itemId, features, metadata) {
await this.db.insert({
id: itemId,
vector: new Float32Array(features),
metadata: {
...metadata,
addedAt: Date.now()
}
});
}
async recommendSimilar(itemId, k = 10, filters = {}) {
// Get the item's embedding
const item = await this.db.get(itemId);
if (!item) return [];
// Find similar items
const results = await this.db.search({
vector: item.vector,
k: k + 1, // +1 because it will include itself
filter: filters
});
// Remove the item itself from results
return results
.filter(r => r.id !== itemId)
.slice(0, k)
.map(r => ({
id: r.id,
similarity: 1 - r.score,
...r.metadata
}));
}
async recommendForUser(userVector, k = 10, category = null) {
const filter = category ? { category } : undefined;
const results = await this.db.search({
vector: new Float32Array(userVector),
k,
filter
});
return results.map(r => ({
id: r.id,
score: 1 - r.score,
...r.metadata
}));
}
}
// Usage
const engine = new RecommendationEngine();
// Add products
await engine.addItem('prod-1', new Array(128).fill(0.1), {
name: 'Laptop',
category: 'electronics',
price: 999
});
// Get similar products
const similar = await engine.recommendSimilar('prod-1', 5, {
category: 'electronics'
});
// Get recommendations for user
const userPreferences = new Array(128).fill(0.15); // From ML model
const recommended = await engine.recommendForUser(userPreferences, 10);
Find and deduplicate similar documents or records.
import { VectorDB } from 'ruvector';
class DuplicateDetector {
private db: VectorDB;
private threshold: number;
constructor(dimensions: number, threshold = 0.95) {
this.db = VectorDB.withDimensions(dimensions);
this.threshold = threshold; // Similarity threshold
}
async addDocument(id: string, embedding: Float32Array, metadata: any) {
// Check for duplicates before adding
const duplicates = await this.findDuplicates(embedding, 1);
if (duplicates.length > 0) {
return {
added: false,
duplicate: duplicates[0]
};
}
await this.db.insert({ id, vector: embedding, metadata });
return { added: true };
}
async findDuplicates(embedding: Float32Array, k = 5) {
const results = await this.db.search({
vector: embedding,
k
});
return results
.filter(r => (1 - r.score) >= this.threshold)
.map(r => ({
id: r.id,
similarity: 1 - r.score,
metadata: r.metadata
}));
}
}
// Usage
const detector = new DuplicateDetector(384, 0.95);
const result = await detector.addDocument(
'doc-1',
documentEmbedding,
{ text: 'Example document' }
);
if (!result.added) {
console.log('Duplicate found:', result.duplicate);
}
Tune the HNSW index for your specific use case:
// High-recall configuration (research, critical applications)
const highRecallDb = new VectorDB({
dimensions: 384,
hnswConfig: {
m: 64, // More connections = better recall
efConstruction: 400, // Higher quality index build
efSearch: 200 // More thorough search
}
});
// Balanced configuration (most applications) - DEFAULT
const balancedDb = new VectorDB({
dimensions: 384,
hnswConfig: {
m: 32,
efConstruction: 200,
efSearch: 100
}
});
// Speed-optimized configuration (real-time applications)
const fastDb = new VectorDB({
dimensions: 384,
hnswConfig: {
m: 16, // Fewer connections = faster
efConstruction: 100,
efSearch: 50 // Faster search, slightly lower recall
}
});
Parameter Guide:
m (16-64): Number of connections per node
efConstruction (100-500): Quality of index construction
efSearch (50-500): Search quality parameter
Reduce memory usage by 4-32x with quantization:
// Product Quantization: 8-32x memory compression
const pqDb = new VectorDB({
dimensions: 768,
quantization: {
type: 'product',
subspaces: 16, // 768 / 16 = 48 dims per subspace
k: 256 // 256 centroids (8-bit quantization)
}
});
// Binary Quantization: 32x compression, very fast (best for Cosine)
const binaryDb = new VectorDB({
dimensions: 384,
quantization: { type: 'binary' }
});
// Scalar Quantization: 4x compression, minimal accuracy loss
const scalarDb = new VectorDB({
dimensions: 384,
quantization: { type: 'scalar' }
});
// No Quantization: Maximum accuracy, more memory
const fullDb = new VectorDB({
dimensions: 384,
quantization: { type: 'none' }
});
Quantization Guide:
Always use batch operations for better performance:
// ❌ Bad: Sequential inserts (slow)
for (const vector of vectors) {
await db.insert({ vector });
}
// ✅ Good: Batch insert (10-50x faster)
await db.insertBatch(
vectors.map(v => ({ vector: new Float32Array(v) }))
);
Choose the right metric for your embeddings:
// Cosine: Best for normalized vectors (text embeddings, most ML models)
const cosineDb = new VectorDB({
dimensions: 384,
distanceMetric: 'Cosine' // Range: [0, 2], lower is better
});
// Euclidean: Best for absolute distances (images, spatial data)
const euclideanDb = new VectorDB({
dimensions: 384,
distanceMetric: 'Euclidean' // Range: [0, ∞], lower is better
});
// DotProduct: Best for positive vectors with magnitude info
const dotProductDb = new VectorDB({
dimensions: 384,
distanceMetric: 'DotProduct' // Range: (-∞, ∞), higher is better
});
// Manhattan: Best for sparse vectors (L1 norm)
const manhattanDb = new VectorDB({
dimensions: 384,
distanceMetric: 'Manhattan' // Range: [0, ∞], lower is better
});
gcc, g++# Clone the repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector/crates/ruvector-node
# Install dependencies
npm install
# Build native addon (development)
npm run build
# Run tests
npm test
# Build for production (optimized)
npm run build:release
# Link for local development
npm link
cd /path/to/your/project
npm link ruvector
Build for different platforms:
# Install cross-compilation tools
npm install -g @napi-rs/cli
# Build for specific platforms
npx napi build --platform --release
# Available platforms:
# - linux-x64-gnu
# - linux-arm64-gnu
# - darwin-x64 (macOS Intel)
# - darwin-arm64 (macOS Apple Silicon)
# - win32-x64-msvc (Windows)
# Format code
cargo fmt
# Lint code
cargo clippy
# Run Rust tests
cargo test
# Build and test Node.js bindings
npm run build && npm test
# Benchmark performance
cargo bench
10,000 vectors (128D):
1,000,000 vectors (128D):
10,000,000 vectors (128D):
See the examples directory for complete working examples:
Run examples:
npm run build
node examples/simple.mjs
node examples/advanced.mjs
node examples/semantic-search.mjs
| Feature | Ruvector | Pure JS | Python (Faiss) | Pinecone |
|---|---|---|---|---|
| Language | Rust (NAPI) | JavaScript | Python | Cloud API |
| Local Latency | <0.5ms | 10-100ms | 1-5ms | 20-50ms+ |
| Throughput | 50K+ QPS | 100-1K | 10K+ | 10K+ |
| Memory (1M) | 800MB | 3GB | 1.5GB | N/A |
| HNSW Index | ✅ Native | ❌ or slow | ✅ | ✅ |
| Quantization | ✅ 4-32x | ❌ | ✅ | ✅ |
| SIMD | ✅ Hardware | ❌ | ✅ | ✅ |
| TypeScript | ✅ Auto-gen | Varies | ❌ | ✅ |
| Async/Await | ✅ Native | ✅ | ✅ | ✅ |
| Offline | ✅ | ✅ | ✅ | ❌ |
| Cost | Free | Free | Free | $$$ |
| Bundle Size | ~2MB | 100KB-1MB | N/A | N/A |
Error: Cannot find module 'ruvector'
Make sure you have Rust installed:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
Error: error: linking with 'cc' failed
Install build tools:
# Linux (Ubuntu/Debian)
sudo apt-get install build-essential
# macOS
xcode-select --install
# Windows
# Install Visual Studio Build Tools
Update NAPI-RS CLI:
npm install -g @napi-rs/cli
efSearch for speed/accuracy tradeoffinsertBatch instead of individual insert callsm parameter in HNSW configWe welcome contributions! See the main Contributing Guide.
# Format Rust code
cargo fmt --all
# Lint Rust code
cargo clippy --workspace -- -D warnings
# Run Rust tests
cargo test -p ruvector-node
# Build Node.js bindings
npm run build
# Run Node.js tests
npm test
# Benchmark performance
cargo bench -p ruvector-bench
MIT License - see LICENSE for details.
Free to use for commercial and personal projects.
Built with battle-tested technologies:
Special thanks to the Rust and Node.js communities!