nekosearch

Crates.ionekosearch
lib.rsnekosearch
version0.0.1
created_at2025-09-18 21:48:25.424254+00
updated_at2025-09-18 21:48:25.424254+00
descriptionA Rust toolkit for text search, fuzzy matching and intent detection: tokenization, normalization, TF-IDF, Jaccard, Levenshtein, and ranking pipelines.
homepagehttps://github.com/pas2rust/
repositoryhttps://github.com/pas2rust/nekosearch
max_upload_size
id1845463
size3,819
PAS (pas2rust)

documentation

https://docs.rs/nekosearch

README

nekosearch

A Rust toolkit for text search, fuzzy matching, and intent detection.
From minimal, dependency-free matching to full ranking pipelines with TF-IDF, Jaccard, and Levenshtein.


📋 Feature Checklist

🔹 Core (std-only)

  • Simple tokenization (split_whitespace)
  • Basic normalization (lowercase, trim)
  • Exact equality comparison
  • Word-by-word comparison (overlap count)
  • Set similarity (basic Jaccard)
  • Character similarity (Hamming, if lengths match)

🔹 Normalization & Preprocessing

  • Remove punctuation
  • Remove stopwords (customizable list)
  • Unicode normalization (NFC/NFD)
  • Accent stripping (configurable)
  • Stemming or lemmatization (at least English/Portuguese)

🔹 Similarity Metrics

  • Levenshtein distance
  • Damerau–Levenshtein (transpositions)
  • Sørensen–Dice coefficient
  • Cosine similarity (with TF-IDF vectors)
  • Advanced Jaccard (n-grams)

🔹 Indexing & Search

  • Basic inverted index (word → docs)
  • Ranking by TF (Term Frequency)
  • Ranking by TF-IDF
  • Approximate search (configurable threshold)
  • Typo-tolerant search

🔹 Fuzzy Matching

  • N-grams (2-gram, 3-gram, etc.)
  • Fast Levenshtein approximation
  • Fuzzy ranking (normalized score 0–1)
  • Partial matching (relevant substrings)

🔹 Advanced Features

  • Compound queries (AND, OR, NOT)
  • Custom weighting support
  • Query expansion (synonyms, related terms)
  • Intent detection pipeline
  • Parallel indexing/search (optional rayon feature)
  • Serialization of index (optional serde feature)
Commit count: 1

cargo fmt