rusty-page-indexer

Crates.iorusty-page-indexer
lib.rsrusty-page-indexer
version0.5.5
created_at2026-01-23 23:03:36.038949+00
updated_at2026-01-24 08:14:08.658011+00
descriptionA high-performance, reasoning-based RAG indexer in Rust following the PageIndex pattern.
homepagehttps://github.com/Algiras/rusty-pageindex
repositoryhttps://github.com/Algiras/rusty-pageindex
max_upload_size
id2065622
size997,048
Algimantas Krasauskas (Algiras)

documentation

README

๐Ÿฆ€ RustyPageIndex

Rusty Page Indexer Cover

RustyPageIndex is a high-performance Rust implementation of the PageIndex pattern. It transforms complex documents into hierarchical "Table-of-Contents" (TOC) trees for vectorless, reasoning-based RAG.

This project is inspired by VectifyAI/PageIndex but has diverged significantly with multi-repo support, parallel processing, and a unified tree architecture.

๐Ÿš€ Key Features

Performance

  • Parallel Indexing: Uses Rayon for parallel file parsing (238 files in ~0.04s)
  • Rust-Native Parsing: pdf-extract and pulldown-cmark for fast document processing
  • Incremental Updates: Hash-based caching skips unchanged files

Multi-Repository Support

  • Index multiple repos: Each indexed folder is tracked separately
  • Query across all: Search spans all indexed repositories by default
  • Manage indices: List, filter, and clean up indices easily

Unified Tree Architecture

  • Folder โ†’ File โ†’ Section hierarchy preserves document structure
  • Single tree per repo: Efficient storage and navigation
  • Smart search: Auto-unwraps folder roots for better LLM context

๐Ÿ”„ Divergence from Original PageIndex

Feature Original PageIndex RustyPageIndex
Language Python Rust
Indexing Per-file indices Unified folder tree
Multi-repo Not supported Full support with list/clean
Parallelism Sequential Rayon parallel processing
Storage Cloud-based (MCP) Local filesystem
Tree Structure Flat sections Folder โ†’ File โ†’ Section hierarchy
Headerless Markdown Empty tree Auto-creates "Document" node

๐Ÿ› ๏ธ Getting Started

Installation

One-liner Install (Unix/macOS):

curl -fsSL https://raw.githubusercontent.com/Algiras/rusty-pageindex/main/install.sh | bash

One-liner Install (Windows PowerShell):

irm https://raw.githubusercontent.com/Algiras/rusty-pageindex/main/install.ps1 | iex

Via Cargo:

cargo install rusty-page-indexer

๐Ÿง™ Use as an Agent Skill

npx skills add https://github.com/Algiras/rusty-pageindex --skill rusty-page-indexer

๐Ÿ”‘ Authentication

# For OpenAI
rusty-page-indexer auth --api-key "your-key-here"

# For Ollama (local LLM)
rusty-page-indexer auth --api-key "ollama" --api-base "http://localhost:11434/v1" --model "llama3.2"

๐ŸŒฒ Usage

Indexing Documents

# Index a repository
rusty-page-indexer index ./my-project

# Index with LLM-generated summaries
rusty-page-indexer index ./my-project --enrich

# Force re-index (ignores cache)
rusty-page-indexer index ./my-project --force

# Preview what would be indexed
rusty-page-indexer index ./my-project --dry-run

Managing Multiple Repositories

# Index multiple repos
rusty-page-indexer index ./repo-a
rusty-page-indexer index ./repo-b

# List all indexed repositories
rusty-page-indexer list

# Example output:
# ๐Ÿ“‹ Indexed Repositories
# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#   ๐Ÿ“ repo-a (125.3 KB)
#      /Users/you/projects/repo-a
#   ๐Ÿ“ repo-b (89.7 KB)
#      /Users/you/projects/repo-b
# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Total: 2 indices

Querying

# Search across ALL indexed repositories
rusty-page-indexer query "how does authentication work"

# Search within a specific repository
rusty-page-indexer query "kafka messaging" --path repo-a

Cleanup

# Remove a specific index
rusty-page-indexer clean repo-a

# Remove all indices
rusty-page-indexer clean --all

Status Information

rusty-page-indexer info

๐Ÿค– Model Compatibility

OpenAI Models (Remote)

Model Cost Speed Notes
gpt-4o $$$ Fast Best accuracy, recommended for complex queries
gpt-4o-mini $ Very Fast Great balance of cost and quality โญ
gpt-4.1-mini $ Very Fast Latest mini variant
gpt-4-turbo $$ Fast Good for detailed reasoning
gpt-3.5-turbo ยข Very Fast Budget option, decent accuracy
# Configure for OpenAI
rusty-page-indexer auth --api-key "sk-..." --model "gpt-4o-mini"

# Override model per query
rusty-page-indexer query "question" --model gpt-4o

Local Models (Ollama)

Model Size Works Notes
gemma3:1b 1B โœ… Minimum recommended for local use
llama3.2:latest 3B โœ… Good balance of speed and accuracy โญ
qwen2.5:7b 7B โœ… Reliable, slightly conservative
llama3.1:latest 8B โœ… Excellent accuracy
mistral:7b 7B โœ… Fast and capable
phi3:mini 3.8B โœ… Microsoft's compact model
qwen2.5:0.5b 0.5B โŒ Too small, unreliable responses
tinyllama:1.1b 1.1B โŒ Doesn't follow output format
# Configure for Ollama
rusty-page-indexer auth --api-key "ollama" --api-base "http://localhost:11434/v1" --model "llama3.2"

# Make sure Ollama is running
ollama serve

OpenAI-Compatible APIs

Works with any OpenAI-compatible endpoint:

# Azure OpenAI
rusty-page-indexer auth --api-key "your-key" --api-base "https://your-resource.openai.azure.com/v1" --model "gpt-4"

# Together AI
rusty-page-indexer auth --api-key "your-key" --api-base "https://api.together.xyz/v1" --model "meta-llama/Llama-3-70b-chat-hf"

# Groq
rusty-page-indexer auth --api-key "your-key" --api-base "https://api.groq.com/openai/v1" --model "llama3-70b-8192"

Recommendation: Use gpt-4o-mini for remote or llama3.2 for local. Add --enrich during indexing for better search quality.


๐Ÿ” How Search Works

  1. Repository Selection: Query matches against all indexed repos (or filtered by --path)
  2. Tree Navigation: LLM navigates the Folder โ†’ File โ†’ Section hierarchy
  3. Content Retrieval: Matching leaf nodes return full text content

The unified tree structure allows the LLM to see file names within folders, making navigation more accurate than flat file lists.


๐Ÿ“ Storage Structure

~/.rusty-page-indexer/
โ”œโ”€โ”€ config.toml           # API credentials and settings
โ”œโ”€โ”€ manifest.json         # Index registry with all repos
โ””โ”€โ”€ indices/
    โ”œโ”€โ”€ {hash-a}.json     # Unified tree for repo-a
    โ””โ”€โ”€ {hash-b}.json     # Unified tree for repo-b

๐Ÿ“ Supported Document Types

Markdown (.md)

  • Parses heading structure (#, ##, ###) into hierarchical tree
  • Headerless files auto-create a "Document" node with full content

PDF (.pdf)

  • Extracts text using pdf-extract
  • Creates single root node with document text
  • Works best with text-based PDFs (not scanned images)

๐Ÿ“„ License

MIT

Commit count: 41

cargo fmt