heroindex_client

Crates.ioheroindex_client
lib.rsheroindex_client
version0.1.3
created_at2025-12-25 07:25:22.412351+00
updated_at2025-12-25 08:14:15.658556+00
descriptionClient library for HeroIndex search server
homepagehttps://forge.ourworld.tf/lhumina_research/hero_index_server
repositoryhttps://forge.ourworld.tf/lhumina_research/hero_index_server
max_upload_size
id2004291
size56,443
kristof de spiegeleer (despiegk)

documentation

https://docs.rs/heroindex_client

README

HeroIndex Client

Crates.io Documentation License: MIT Repository

A Rust client library for HeroIndex, a high-performance full-text search server built on Tantivy.

Repository: https://forge.ourworld.tf/lhumina_research/hero_index_server

Need the server? Install it with cargo install heroindex or see heroindex on crates.io

Features

  • Async/Await - Built on Tokio for async operations
  • Type-Safe - Strongly typed responses
  • Simple API - Intuitive method names matching RPC calls
  • 10+ Field Types - text, str, u64, i64, f64, date, bool, json, bytes, ip
  • 8 Query Types - match, term, fuzzy, phrase, prefix, range, regex, boolean
  • Batch Operations - Efficient bulk document insertion

Installation

Add to your Cargo.toml:

[dependencies]
heroindex_client = "0.1"
tokio = { version = "1", features = ["full"] }
serde_json = "1"

Quick Start

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), heroindex_client::Error> {
    // Connect to HeroIndex server
    let mut client = HeroIndexClient::connect("/tmp/heroindex.sock").await?;
    
    // Create a database with schema
    client.db_create("articles", json!({
        "fields": [
            {"name": "title", "type": "text", "stored": true, "indexed": true},
            {"name": "body", "type": "text", "stored": true, "indexed": true},
            {"name": "views", "type": "u64", "stored": true, "indexed": true, "fast": true}
        ]
    })).await?;
    
    // Select database and add documents
    client.db_select("articles").await?;
    client.doc_add(json!({"title": "Hello World", "body": "Welcome to search", "views": 100})).await?;
    client.commit().await?;
    client.reload().await?;
    
    // Search
    let results = client.search(json!({"type": "match", "field": "body", "value": "search"}), 10, 0).await?;
    println!("Found {} results", results.total_hits);
    
    Ok(())
}

Field Types

When creating a schema, you can use these field types:

Type Description Example Value Use Case
text Full-text searchable, tokenized "Hello World" Articles, descriptions
str Exact match keyword, not tokenized "user-123" IDs, tags, status
u64 Unsigned 64-bit integer 42 Counts, ages
i64 Signed 64-bit integer -10 Scores, offsets
f64 64-bit floating point 3.14 Prices, ratings
date DateTime (RFC 3339 format) "2024-01-15T10:30:00Z" Timestamps
bool Boolean true Flags, toggles
json Nested JSON object {"key": "value"} Metadata, attributes
bytes Binary data (base64) "SGVsbG8=" Hashes, binary
ip IP address "192.168.1.1" Network logs

Field Options

Each field can have these options:

  • stored: true - Store the value to retrieve it in search results
  • indexed: true - Index the field to make it searchable
  • fast: true - Enable fast fields for sorting and aggregations (numeric types)
  • tokenizer: "en_stem" - Use stemming tokenizer for text fields

Schema Example

client.db_create("products", json!({
    "fields": [
        {"name": "id", "type": "str", "stored": true, "indexed": true},
        {"name": "name", "type": "text", "stored": true, "indexed": true, "tokenizer": "en_stem"},
        {"name": "description", "type": "text", "stored": true, "indexed": true},
        {"name": "price", "type": "f64", "stored": true, "indexed": true, "fast": true},
        {"name": "stock", "type": "u64", "stored": true, "indexed": true, "fast": true},
        {"name": "created_at", "type": "date", "stored": true, "indexed": true},
        {"name": "active", "type": "bool", "stored": true, "indexed": true},
        {"name": "metadata", "type": "json", "stored": true, "indexed": true}
    ]
})).await?;

Query Types

1. Match Query (Full-Text Search)

Tokenizes the query and finds documents containing any of the terms.

// Simple full-text search
client.search(json!({
    "type": "match", 
    "field": "description", 
    "value": "wireless bluetooth headphones"
}), 10, 0).await?;

2. Term Query (Exact Match)

Finds documents with the exact value (no tokenization). Best for keyword fields.

// Find by exact ID
client.search(json!({
    "type": "term",
    "field": "id",
    "value": "prod-001"
}), 10, 0).await?;

// Find by status
client.search(json!({
    "type": "term",
    "field": "status",
    "value": "published"
}), 10, 0).await?;

3. Fuzzy Query (Typo-Tolerant)

Finds documents even with spelling mistakes. Distance is the max number of character edits.

// Finds "keyboard" even when user types "keybaord"
client.search(json!({
    "type": "fuzzy",
    "field": "name",
    "value": "keybaord",
    "distance": 2  // Allow up to 2 character differences
}), 10, 0).await?;

// Finds "smartphone" from "smartfone"
client.search(json!({
    "type": "fuzzy",
    "field": "name",
    "value": "smartfone",
    "distance": 2
}), 10, 0).await?;

4. Phrase Query (Exact Phrase Match)

Finds documents containing the exact phrase in order.

// Must contain "machine learning" as an exact phrase
client.search(json!({
    "type": "phrase",
    "field": "content",
    "value": "machine learning algorithms"
}), 10, 0).await?;

5. Prefix Query (Autocomplete)

Finds documents where the field starts with the given prefix.

// Autocomplete: find all products starting with "wire"
client.search(json!({
    "type": "prefix",
    "field": "name",
    "value": "wire"  // Matches "wireless", "wired", "wire-free"
}), 10, 0).await?;

6. Range Query (Numeric/Date Ranges)

Finds documents within a numeric or date range.

// Products between $50 and $100
client.search(json!({
    "type": "range",
    "field": "price",
    "gte": 50.0,   // Greater than or equal
    "lt": 100.0    // Less than
}), 10, 0).await?;

// Products with at least 10 in stock
client.search(json!({
    "type": "range",
    "field": "stock",
    "gte": 10
}), 10, 0).await?;

// All options: gt, gte, lt, lte
client.search(json!({
    "type": "range",
    "field": "views",
    "gt": 100,     // Greater than (exclusive)
    "lte": 1000    // Less than or equal
}), 10, 0).await?;

7. Regex Query (Pattern Matching)

Finds documents matching a regular expression.

// Find product codes matching pattern
client.search(json!({
    "type": "regex",
    "field": "sku",
    "value": "PRD-[0-9]{4}-[A-Z]+"
}), 10, 0).await?;

// Find emails from specific domain
client.search(json!({
    "type": "regex",
    "field": "email",
    "value": ".*@company\\.com"
}), 10, 0).await?;

8. Boolean Query (Complex Combinations)

Combines multiple queries with AND/OR/NOT logic.

// Complex search: electronics that are premium but not discontinued
client.search(json!({
    "type": "boolean",
    "must": [
        // ALL must match (AND)
        {"type": "match", "field": "category", "value": "electronics"},
        {"type": "range", "field": "price", "gte": 100.0}
    ],
    "should": [
        // At least one should match for higher score (OR)
        {"type": "match", "field": "name", "value": "premium"},
        {"type": "match", "field": "name", "value": "pro"}
    ],
    "must_not": [
        // NONE should match (NOT)
        {"type": "term", "field": "status", "value": "discontinued"},
        {"type": "term", "field": "status", "value": "out_of_stock"}
    ]
}), 10, 0).await?;

9. All Query (Match Everything)

Returns all documents in the index.

// Get all documents
client.search(json!({"type": "all"}), 100, 0).await?;

Real-World Query Examples

E-commerce Product Search

// User searches "wireless mouse" - combine fuzzy for typo tolerance with filters
client.search(json!({
    "type": "boolean",
    "must": [
        {"type": "fuzzy", "field": "name", "value": "wireless mouse", "distance": 1}
    ],
    "should": [
        {"type": "match", "field": "description", "value": "ergonomic"},
        {"type": "range", "field": "rating", "gte": 4.0}
    ],
    "must_not": [
        {"type": "term", "field": "in_stock", "value": false}
    ]
}), 20, 0).await?;

Log Search

// Find error logs from the last hour
client.search(json!({
    "type": "boolean",
    "must": [
        {"type": "term", "field": "level", "value": "ERROR"},
        {"type": "range", "field": "timestamp", "gte": "2024-01-15T09:00:00Z"}
    ],
    "should": [
        {"type": "match", "field": "message", "value": "connection timeout"},
        {"type": "match", "field": "message", "value": "database error"}
    ]
}), 100, 0).await?;

Article Search with Pagination

// Search articles, page 3 (20 results per page)
let page = 3;
let per_page = 20;
let offset = (page - 1) * per_page;

let results = client.search(json!({
    "type": "match",
    "field": "content",
    "value": "rust programming"
}), per_page, offset).await?;

println!("Page {} of {}", page, (results.total_hits + per_page - 1) / per_page);

Document Operations

Adding Documents

// Single document
client.doc_add(json!({
    "id": "prod-001",
    "name": "Wireless Mouse",
    "price": 29.99
})).await?;

// Batch insert (much faster for bulk operations)
client.doc_add_batch(vec![
    json!({"id": "prod-002", "name": "Keyboard", "price": 59.99}),
    json!({"id": "prod-003", "name": "Monitor", "price": 299.99}),
    json!({"id": "prod-004", "name": "Webcam", "price": 79.99}),
]).await?;

// IMPORTANT: Commit and reload to make documents searchable
client.commit().await?;
client.reload().await?;

Deleting Documents

// Delete by field value
client.doc_delete("id", json!("prod-001")).await?;
client.commit().await?;
client.reload().await?;

Database Management

// List all databases
let list = client.db_list().await?;
for db in &list.databases {
    println!("{}: {} docs, {} bytes", db.name, db.doc_count, db.size_bytes);
}

// Create database
client.db_create("logs", json!({
    "fields": [
        {"name": "timestamp", "type": "date", "stored": true, "indexed": true, "fast": true},
        {"name": "level", "type": "str", "stored": true, "indexed": true},
        {"name": "message", "type": "text", "stored": true, "indexed": true}
    ]
})).await?;

// Select database (required before operations)
client.db_select("logs").await?;

// Get database info
let info = client.db_info().await?;
println!("Documents: {}, Segments: {}", info.doc_count, info.segment_count);

// Delete database
client.db_delete("old_logs").await?;

Error Handling

use heroindex_client::{Error, error_codes};

match client.db_select("nonexistent").await {
    Ok(_) => println!("Selected"),
    Err(Error::Rpc { code, message }) => {
        match code {
            error_codes::DATABASE_NOT_FOUND => println!("Database not found"),
            error_codes::NO_DATABASE_SELECTED => println!("Select a database first"),
            error_codes::INVALID_QUERY => println!("Invalid query syntax"),
            _ => println!("RPC error {}: {}", code, message),
        }
    }
    Err(Error::Connection(e)) => println!("Connection error: {}", e),
    Err(e) => println!("Other error: {}", e),
}

Response Types

All methods return strongly-typed responses:

Type Fields
PingResponse status, version
ServerStats uptime_secs, databases, total_docs
DatabaseList databases: Vec<DatabaseInfo>
DatabaseInfo name, doc_count, size_bytes, segment_count
SchemaInfo fields: Vec<FieldInfo>
SearchResult total_hits, hits: Vec<SearchHit>, took_ms
SearchHit score, doc: serde_json::Value
CountResult count
OpResult success, opstamp

Performance Tips

  1. Use batch inserts - doc_add_batch is 10-100x faster than individual doc_add calls
  2. Commit periodically - Don't commit after every document, batch them
  3. Enable fast fields - For fields used in sorting/filtering/aggregations
  4. Use term queries - For exact matches on keyword fields, term is faster than match
  5. Limit results - Always set reasonable limit values, fetch more with pagination

Related Crates

License

MIT License

Commit count: 0

cargo fmt