heroindex

Crates.ioheroindex
lib.rsheroindex
version0.1.3
created_at2025-12-25 07:25:31.898596+00
updated_at2025-12-25 08:14:28.563863+00
descriptionA Tantivy-based indexing server with OpenRPC socket interface
homepagehttps://forge.ourworld.tf/lhumina_research/hero_index_server
repositoryhttps://forge.ourworld.tf/lhumina_research/hero_index_server
max_upload_size
id2004292
size149,365
kristof de spiegeleer (despiegk)

documentation

https://docs.rs/heroindex

README

HeroIndex

Crates.io Documentation License: MIT Repository

A high-performance full-text search server built on Tantivy, exposing an OpenRPC interface over Unix sockets.

Repository: https://forge.ourworld.tf/lhumina_research/hero_index_server

Looking for the client library? See heroindex_client for easy integration into your Rust applications.

Features

  • Multiple Index Management - Create, delete, and manage multiple search indexes
  • Dynamic Schemas - Define custom schemas with 10+ field types
  • Powerful Queries - Full-text, fuzzy, phrase, boolean, range, regex queries
  • OpenRPC Discovery - Self-documenting API via rpc.discover
  • Concurrent Connections - Handle multiple clients simultaneously
  • Fast Fields - Columnar storage for sorting and aggregations
  • Zero-Copy Search - Efficient memory-mapped index files

Installation

From crates.io

cargo install heroindex

From source

git clone https://forge.ourworld.tf/lhumina_research/hero_index_server.git
cd hero_index_server
cargo build --release

Quick Start

1. Start the Server

heroindex --dir /var/lib/heroindex --socket /tmp/heroindex.sock

2. Connect with the Client Library

Use heroindex_client to connect:

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), heroindex_client::Error> {
    let mut client = HeroIndexClient::connect("/tmp/heroindex.sock").await?;
    
    // Create an index
    client.db_create("articles", json!({
        "fields": [
            {"name": "title", "type": "text", "stored": true, "indexed": true},
            {"name": "body", "type": "text", "stored": true, "indexed": true}
        ]
    })).await?;
    
    // Add documents
    client.db_select("articles").await?;
    client.doc_add(json!({"title": "Hello", "body": "World"})).await?;
    client.commit().await?;
    client.reload().await?;
    
    // Search
    let results = client.search(
        json!({"type": "match", "field": "body", "value": "world"}),
        10, 0
    ).await?;
    
    println!("Found {} results", results.total_hits);
    Ok(())
}

Command Line Options

heroindex [OPTIONS]

Options:
  -d, --dir <DIR>        Base directory for all indexes
  -s, --socket <SOCKET>  Unix socket path for RPC interface
  -h, --help             Print help
  -V, --version          Print version

Schema Definition

Define your index schema with these field types:

Type Description Options
text Full-text searchable (tokenized) stored, indexed, fast, tokenizer
str Exact match string (keyword) stored, indexed, fast
u64 Unsigned 64-bit integer stored, indexed, fast
i64 Signed 64-bit integer stored, indexed, fast
f64 64-bit floating point stored, indexed, fast
date DateTime (RFC 3339) stored, indexed, fast
bool Boolean stored, indexed, fast
json JSON object stored, indexed
bytes Binary data stored, indexed, fast
ip IP address stored, indexed, fast

Example Schema

{
  "fields": [
    {"name": "id", "type": "str", "stored": true, "indexed": true},
    {"name": "title", "type": "text", "stored": true, "indexed": true, "tokenizer": "en_stem"},
    {"name": "content", "type": "text", "stored": true, "indexed": true},
    {"name": "views", "type": "u64", "stored": true, "indexed": true, "fast": true},
    {"name": "rating", "type": "f64", "stored": true, "indexed": true, "fast": true},
    {"name": "published", "type": "date", "stored": true, "indexed": true, "fast": true},
    {"name": "active", "type": "bool", "stored": true, "indexed": true},
    {"name": "metadata", "type": "json", "stored": true, "indexed": true}
  ]
}

Query Types

Match Query (Full-Text)

{"type": "match", "field": "content", "value": "search terms"}

Term Query (Exact)

{"type": "term", "field": "id", "value": "abc123"}

Fuzzy Query (Typo-Tolerant)

{"type": "fuzzy", "field": "title", "value": "serch", "distance": 2}

Phrase Query

{"type": "phrase", "field": "content", "value": "exact phrase match"}

Prefix Query

{"type": "prefix", "field": "title", "value": "hel"}

Range Query

{"type": "range", "field": "views", "gte": 100, "lt": 1000}

Regex Query

{"type": "regex", "field": "title", "value": "test.*"}

Boolean Query

{
  "type": "boolean",
  "must": [{"type": "match", "field": "content", "value": "rust"}],
  "should": [{"type": "match", "field": "title", "value": "tutorial"}],
  "must_not": [{"type": "term", "field": "status", "value": "draft"}]
}

RPC Methods

Method Description
rpc.discover Get OpenRPC schema
server.ping Health check
server.stats Server statistics
db.list List all databases
db.create Create database with schema
db.delete Delete a database
db.select Select database for operations
db.info Get database info
schema.get Get current schema
doc.add Add single document
doc.add_batch Add multiple documents
doc.delete Delete by term
index.commit Commit changes
index.reload Reload to see changes
search.query Execute search
search.count Count matches

Performance Tips

  1. Use batch inserts - doc.add_batch is much faster than individual adds
  2. Commit periodically - Don't commit after every document
  3. Enable fast fields - For fields used in sorting/filtering
  4. Use appropriate tokenizers - en_stem for English, raw for keywords

Related Crates

License

MIT License - see LICENSE for details.

Credits

Built on the excellent Tantivy search engine library.

Commit count: 0

cargo fmt