oxirs-vec

Crates.io	oxirs-vec
lib.rs	oxirs-vec
version	0.1.0
created_at	2025-09-30 08:29:43.33543+00
updated_at	2026-01-20 21:15:26.209532+00
description	Vector index abstractions for semantic similarity and AI-augmented querying
homepage	https://github.com/cool-japan/oxirs
repository	https://github.com/cool-japan/oxirs
max_upload_size
id	1860777
size	3,936,949

KitaSan (cool-japan)

documentation

README

OxiRS Vec - Vector Search Engine

Status: Production Release (v0.1.0) - Released January 7, 2026

✨ Production Release: Production-ready with API stability guarantees and comprehensive testing.

High-performance vector search infrastructure for semantic similarity search in RDF knowledge graphs.

Features

Vector Indexing

HNSW Index - Hierarchical Navigable Small World graphs for fast approximate nearest neighbor search
Flat Index - Exact search for smaller datasets
IVF Index - Inverted file index for large-scale datasets
Dynamic Updates - Real-time index updates without full rebuilds

Search Capabilities

Similarity Search - Find semantically similar entities
Filtered Search - Combine vector similarity with RDF constraints
Batch Operations - Efficient bulk indexing and search
Multiple Distance Metrics - Cosine, Euclidean, Manhattan, Dot product

Integration

SPARQL Extension - Vector search functions in SPARQL queries
GraphQL Support - Vector similarity in GraphQL queries
Embedding Models - Integration with various embedding providers
Storage Backends - Persistent vector indices

Installation

Add to your Cargo.toml:

# Experimental feature
[dependencies]
oxirs-vec = "0.1.0"

Quick Start

Basic Vector Search

use oxirs_vec::{VectorStore, IndexType, DistanceMetric};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create vector store with HNSW index
    let mut store = VectorStore::builder()
        .index_type(IndexType::HNSW)
        .dimension(768)  // Embedding dimension
        .distance_metric(DistanceMetric::Cosine)
        .build()?;

    // Add vectors
    store.add_vector("entity1", &embedding1)?;
    store.add_vector("entity2", &embedding2)?;

    // Build index
    store.build_index()?;

    // Search for similar vectors
    let results = store.search(&query_vector, 10, 0.8)?;
    for result in results {
        println!("ID: {}, Score: {}", result.id, result.score);
    }

    Ok(())
}

SPARQL Integration

use oxirs_vec::sparql::VectorFunctions;

let sparql = r#"
    PREFIX vec: <http://oxirs.org/vec/>

    SELECT ?entity ?score WHERE {
        ?entity a foaf:Person .

        # Vector similarity search
        ?entity vec:similarTo "machine learning researcher" .
        ?entity vec:similarity ?score .

        FILTER (?score > 0.8)
    }
    ORDER BY DESC(?score)
    LIMIT 10
"#;

Architecture

Index Types

HNSW (Hierarchical Navigable Small World)

Use Case: General purpose, balanced performance
Search Time: O(log N)
Build Time: O(N log N)
Memory: Moderate

Flat Index

Use Case: Small datasets, exact search required
Search Time: O(N)
Build Time: O(N)
Memory: Low

IVF (Inverted File)

Use Case: Large datasets, acceptable approximate results
Search Time: O(√N)
Build Time: O(N)
Memory: Moderate

Distance Metrics

pub enum DistanceMetric {
    Cosine,      // For normalized embeddings
    Euclidean,   // For absolute distances
    Manhattan,   // For high-dimensional spaces
    DotProduct,  // For similarity scores
}

Advanced Features

Filtered Search

Combine vector similarity with RDF constraints:

use oxirs_vec::FilteredSearch;

let filters = FilteredSearch::builder()
    .add_constraint("rdf:type", "foaf:Person")
    .add_constraint("foaf:age", |age: i32| age > 18)
    .build();

let results = store.filtered_search(&query_vector, filters, 10)?;

Batch Operations

Efficient bulk indexing:

let batch = vec![
    ("entity1", embedding1),
    ("entity2", embedding2),
    ("entity3", embedding3),
];

store.add_batch(batch)?;
store.build_index()?;

Incremental Updates

// Add without full rebuild
store.add_incremental("new_entity", &embedding)?;

// Periodic optimization
store.optimize_index()?;

Performance

Benchmarks (on sample datasets)

Dataset Size	Index Type	Build Time	Query Time (10-NN)
10K vectors	HNSW	2.5s	0.5ms
100K vectors	HNSW	28s	1.2ms
1M vectors	HNSW	320s	2.8ms
10K vectors	Flat	0.1s	12ms
100K vectors	IVF	15s	3.5ms

Benchmarked on M1 Mac with 768-dimensional vectors

Configuration

let config = VectorStoreConfig {
    index_type: IndexType::HNSW,
    dimension: 768,
    distance_metric: DistanceMetric::Cosine,

    // HNSW-specific parameters
    hnsw_m: 16,              // Number of connections per node
    hnsw_ef_construction: 200, // Construction time accuracy
    hnsw_ef_search: 100,      // Search time accuracy

    // Storage options
    persist_path: Some("./vector_index".into()),
    cache_size: 1000,
};

Integration Examples

With oxirs-embed

use oxirs_embed::EmbeddingModel;
use oxirs_vec::VectorStore;

// Generate embeddings
let model = EmbeddingModel::load("sentence-transformers/all-mpnet-base-v2")?;
let embedding = model.encode("Machine learning research")?;

// Index and search
let mut store = VectorStore::new(IndexType::HNSW, 768)?;
store.add_vector("doc1", &embedding)?;

With oxirs-core (RDF)

use oxirs_core::Dataset;
use oxirs_vec::RdfVectorIndex;

let dataset = Dataset::from_file("knowledge_graph.ttl")?;
let mut index = RdfVectorIndex::new(&dataset)?;

// Index entities by their descriptions
for entity in dataset.subjects() {
    if let Some(description) = dataset.get_description(&entity) {
        let embedding = model.encode(&description)?;
        index.add_entity(&entity, &embedding)?;
    }
}

Status

Production Release (v0.1.0)

✅ HNSW/IVF/Flat indices with persisted dataset support
✅ SPARQL/GraphQL integration enhanced with federation-aware vector filters
✅ CLI pipelines for batch embedding import/export and monitoring
✅ SciRS2 metrics for query latency, recall, and index health
🚧 GPU acceleration (targeted for future release)
🚧 Distributed indexing (planned for v0.2.0)