heroindex

Crates.io	heroindex
lib.rs	heroindex
version	0.1.3
created_at	2025-12-25 07:25:31.898596+00
updated_at	2025-12-25 08:14:28.563863+00
description	A Tantivy-based indexing server with OpenRPC socket interface
homepage	https://forge.ourworld.tf/lhumina_research/hero_index_server
repository	https://forge.ourworld.tf/lhumina_research/hero_index_server
max_upload_size
id	2004292
size	149,365

kristof de spiegeleer (despiegk)

documentation

https://docs.rs/heroindex

README

HeroIndex

A high-performance full-text search server built on Tantivy, exposing an OpenRPC interface over Unix sockets.

Repository: https://forge.ourworld.tf/lhumina_research/hero_index_server

Looking for the client library? See heroindex_client for easy integration into your Rust applications.

Features

Multiple Index Management - Create, delete, and manage multiple search indexes
Dynamic Schemas - Define custom schemas with 10+ field types
Powerful Queries - Full-text, fuzzy, phrase, boolean, range, regex queries
OpenRPC Discovery - Self-documenting API via rpc.discover
Concurrent Connections - Handle multiple clients simultaneously
Fast Fields - Columnar storage for sorting and aggregations
Zero-Copy Search - Efficient memory-mapped index files

Installation

From crates.io

cargo install heroindex

From source

git clone https://forge.ourworld.tf/lhumina_research/hero_index_server.git
cd hero_index_server
cargo build --release

Quick Start

1. Start the Server

heroindex --dir /var/lib/heroindex --socket /tmp/heroindex.sock

2. Connect with the Client Library

Use heroindex_client to connect:

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), heroindex_client::Error> {
    let mut client = HeroIndexClient::connect("/tmp/heroindex.sock").await?;
    
    // Create an index
    client.db_create("articles", json!({
        "fields": [
            {"name": "title", "type": "text", "stored": true, "indexed": true},
            {"name": "body", "type": "text", "stored": true, "indexed": true}
        ]
    })).await?;
    
    // Add documents
    client.db_select("articles").await?;
    client.doc_add(json!({"title": "Hello", "body": "World"})).await?;
    client.commit().await?;
    client.reload().await?;
    
    // Search
    let results = client.search(
        json!({"type": "match", "field": "body", "value": "world"}),
        10, 0
    ).await?;
    
    println!("Found {} results", results.total_hits);
    Ok(())
}

Command Line Options

heroindex [OPTIONS]

Options:
  -d, --dir <DIR>        Base directory for all indexes
  -s, --socket <SOCKET>  Unix socket path for RPC interface
  -h, --help             Print help
  -V, --version          Print version

Schema Definition

Define your index schema with these field types:

Type	Description	Options
`text`	Full-text searchable (tokenized)	`stored`, `indexed`, `fast`, `tokenizer`
`str`	Exact match string (keyword)	`stored`, `indexed`, `fast`
`u64`	Unsigned 64-bit integer	`stored`, `indexed`, `fast`
`i64`	Signed 64-bit integer	`stored`, `indexed`, `fast`
`f64`	64-bit floating point	`stored`, `indexed`, `fast`
`date`	DateTime (RFC 3339)	`stored`, `indexed`, `fast`
`bool`	Boolean	`stored`, `indexed`, `fast`
`json`	JSON object	`stored`, `indexed`
`bytes`	Binary data	`stored`, `indexed`, `fast`
`ip`	IP address	`stored`, `indexed`, `fast`

Example Schema

{
  "fields": [
    {"name": "id", "type": "str", "stored": true, "indexed": true},
    {"name": "title", "type": "text", "stored": true, "indexed": true, "tokenizer": "en_stem"},
    {"name": "content", "type": "text", "stored": true, "indexed": true},
    {"name": "views", "type": "u64", "stored": true, "indexed": true, "fast": true},
    {"name": "rating", "type": "f64", "stored": true, "indexed": true, "fast": true},
    {"name": "published", "type": "date", "stored": true, "indexed": true, "fast": true},
    {"name": "active", "type": "bool", "stored": true, "indexed": true},
    {"name": "metadata", "type": "json", "stored": true, "indexed": true}
  ]
}

Query Types

Match Query (Full-Text)

{"type": "match", "field": "content", "value": "search terms"}

Term Query (Exact)

{"type": "term", "field": "id", "value": "abc123"}

Fuzzy Query (Typo-Tolerant)

{"type": "fuzzy", "field": "title", "value": "serch", "distance": 2}

Phrase Query

{"type": "phrase", "field": "content", "value": "exact phrase match"}

Prefix Query

{"type": "prefix", "field": "title", "value": "hel"}

Range Query

{"type": "range", "field": "views", "gte": 100, "lt": 1000}

Regex Query

{"type": "regex", "field": "title", "value": "test.*"}

Boolean Query

{
  "type": "boolean",
  "must": [{"type": "match", "field": "content", "value": "rust"}],
  "should": [{"type": "match", "field": "title", "value": "tutorial"}],
  "must_not": [{"type": "term", "field": "status", "value": "draft"}]
}

RPC Methods

Method	Description
`rpc.discover`	Get OpenRPC schema
`server.ping`	Health check
`server.stats`	Server statistics
`db.list`	List all databases
`db.create`	Create database with schema
`db.delete`	Delete a database
`db.select`	Select database for operations
`db.info`	Get database info
`schema.get`	Get current schema
`doc.add`	Add single document
`doc.add_batch`	Add multiple documents
`doc.delete`	Delete by term
`index.commit`	Commit changes
`index.reload`	Reload to see changes
`search.query`	Execute search
`search.count`	Count matches

Performance Tips

Use batch inserts - doc.add_batch is much faster than individual adds
Commit periodically - Don't commit after every document
Enable fast fields - For fields used in sorting/filtering
Use appropriate tokenizers - en_stem for English, raw for keywords

Related Crates

heroindex_client - Client library for connecting to HeroIndex

License

MIT License - see LICENSE for details.

Credits

Built on the excellent Tantivy search engine library.

Commit count: 0