| Crates.io | sff |
| lib.rs | sff |
| version | 0.4.0 |
| created_at | 2025-06-13 16:16:54.334403+00 |
| updated_at | 2025-11-14 12:43:39.883003+00 |
| description | SemanticFileFinder (sff): Fast semantic file finder using sentence embeddings. Searches .txt, .md, .mdx files. |
| homepage | |
| repository | https://github.com/your_username/sff |
| max_upload_size | |
| id | 1711627 |
| size | 228,086 |
sff (SemanticFileFinder) is a command-line tool that rapidly searches for files in a given directory based on the semantic meaning of your query. It leverages sentence embeddings through model2vec-rs to understand content, not just keywords. It reads .txt, .md, and .mdx files, chunks their content, and ranks them by similarity to find the most relevant text snippets.
You can install sff using Cargo:
cargo install sff
sff "project ideas for rust"
Ensure ~/.cargo/bin is in your system's PATH. Default is cwd with --path .
To build and install from source:
git clone https://github.com/do-me/sff.git
cd sff
cargo build --release
cargo install --path .
The binary will be available in target/release/sff and installed to ~/.cargo/bin/sff.
I use this tool myself to scan my personal notes. In the past these were simple .txt files in a folder until I migrated everything to iCloud + Obsidian. Here is some sample output from some random notes:

tl;dr: under 250ms for English-only models on ~2500 files and 10k chunks (with 20 words per chunk) on an M3 Max. If you need the best possible results and good multilingual retrieval, go for minishlab/potion-multilingual-128M.
Else, stick to the default with minishlab/potion-retrieval-32M. Keep an eye on new model2vec models here: https://huggingface.co/minishlab.
| Command | Model | Query | Files | Chunks | Time (ms) |
|---|---|---|---|---|---|
sff -m "minishlab/potion-base-8M" "javascript" |
potion-base-8M | javascript | 2537 | 10000 | 209.34 |
sff -m "minishlab/potion-retrieval-32M" "javascript" |
potion-retrieval-32M | javascript | 2537 | 10000 | 249.95 |
sff -m "minishlab/potion-multilingual-128M" "javascript" |
potion-multilingual-128M | javascript | 2537 | 10000 | 1001.69 |
.txt, .md, and .mdx files.model2vec-rs to generate text embeddings. Models are typically downloaded from Hugging Face Hub.The basic command structure is:
sff [OPTIONS] <QUERY>...
Examples:
Search in the current directory for "machine learning techniques":
sff "machine learning techniques"
Search recursively in ~/Documents/notes for "project ideas for rust":
sff -p ~/Documents/notes -r "project ideas for rust"
Use a different model and limit results to 5:
sff -m "minishlab/potion-multilingual-128M" -l 5 "benefits of parallel computing"
Format as JSON:
sff "javascript" --json
All Options:
You can view all available options with sff --help:
sff: Fast semantic file finder
Usage: sff [OPTIONS] <QUERY>...
Arguments:
<QUERY>...
The semantic search query
Options:
-p, --path <PATH>
The directory to search in
[default: .]
-m, --model <MODEL>
Model to use for embeddings, from Hugging Face Hub or local path
[default: minishlab/potion-retrieval-32M]
-l, --limit <LIMIT>
Number of top results to display
[default: 10]
-r, --recursive
Search recursively through all subdirectories
-v, --verbose
Enable verbose mode to print detailed timings for nerds
-h, --help
Print help (see more with '--help')
-V, --version
Print version
--json
Instead of table return JSON formatted output
-e, --extension <EXTENSION>
Choose file extension to target,
or multiple extensions delimited with "," (e.g. "-e md,org"),
or with separate arguments (e.g. "-e md -e org")
[default: txt md mdx org]
sff uses model2vec-rs, which typically downloads models from the Hugging Face Hub. The default model is minishlab/potion-retrieval-32M. You can specify any compatible sentence transformer model available on the Hub or a local path to a model. The first time you use a new model, it will be downloaded, which might take some time.
PR's always welcome!
If you want to search any folder on iCloud (e.g. your Obsidian vault) you need to grant full disk access to your shell, e.g. iTerm2 in the system settings:
Reopen the shell and the problem should be fixed.
This repo is following semantic versioning in this format MAJOR.MINOR.PATCH, where MAJOR includes breaking changes, MINOR means backward-compatible changes/features and PATCH is just used for bug fixes. Most of the releases you see here should be MINOR so you can simply update this sff crate from time to time to see what's new.
How to publish a new release on crates.io (mainly notes to myself):
cli.rs & Cargo.tomlgit add ., git commit -m "Prepare for vMAJOR.MINOR.PATCH", git pushcargo publishgit tag vMAJOR.MINOR.PATCH, git push origin vMAJOR.MINOR.PATCHBuilt by Dominik Weckmüller. If you like semantic search, check out my other work on GitHub e.g. SemanticFinder!