minisearchtk

Crates.iominisearchtk
lib.rsminisearchtk
version0.2.0
created_at2026-01-13 21:30:34.809596+00
updated_at2026-01-13 21:30:34.809596+00
descriptionSmall toolkit for crawling and searching web pages
homepage
repositoryhttps://github.com/konfou/minisearch-rs
max_upload_size
id2041402
size4,545,734
Konstantinos (konfou)

documentation

README

minisearch-rs

A small Rust toolkit for crawling and searching web pages.

Key binaries:

  • mycrawler: fetch pages to {hash}.html, append metadata to crawl.db, persist the crawl queue (including in-flight) for recovery, honor nofollow/noindex flags, optionally respect robots.txt / crawl-delay and per-host throttling, and let you set a User-Agent.
  • minisearch: load the saved corpus, build a BM25 index with optional on-disk caches (BM25 stats and token cache for incremental rebuild), then query via a REPL (/search, /df, /tf) with snippets and highlights.
  • searchsrv: serve the same search over HTTP with a minimal HTML UI, reusing the caches when available.

See docs/ for architecture details and future improvement ideas.

This project is my own implementation of a semester assignment shared by a friend.

Commit count: 33

cargo fmt