| Crates.io | smgrep |
| lib.rs | smgrep |
| version | 0.6.0 |
| created_at | 2025-11-27 16:32:55.261185+00 |
| updated_at | 2025-11-29 03:04:49.705202+00 |
| description | Semantic code search tool with GPU acceleration |
| homepage | https://github.com/can1357/smgrep |
| repository | https://github.com/can1357/smgrep |
| max_upload_size | |
| id | 1954077 |
| size | 915,607 |
Natural-language search that works like grep. Fast, local, GPU-accelerated, and built for coding agents.
Install
cargo install smgrep
Or build from source:
git clone https://github.com/can1357/smgrep
cd smgrep
cargo build --release
For CPU-only builds (no CUDA):
cargo build --release --no-default-features
Setup (Recommended)
smgrep setup
Downloads embedding models (~500MB) and tree-sitter grammars upfront. If you skip this, models download automatically on first use.
Search
cd my-repo
smgrep "where do we handle authentication?"
Your first search will automatically index the repository. Each repository is automatically isolated with its own index. Switching between repos "just works".
smgrep claude-installclaude) and ask questions about your codebase.smgrep serve daemon and provides semantic search.smgrep includes a built-in MCP (Model Context Protocol) server:
smgrep mcp
This exposes a sem_search tool that agents can use for semantic code search. The server auto-starts the background daemon if needed.
smgrep [query]The default command. Searches the current directory using semantic meaning.
smgrep "how is the database connection pooled?"
Options:
| Flag | Description | Default |
|---|---|---|
-m <n> |
Max total results to return | 10 |
--per-file <n> |
Max matches per file | 1 |
-c, --content |
Show full chunk content | false |
--compact |
Show file paths only | false |
--scores |
Show relevance scores | false |
-s, --sync |
Force re-index before search | false |
--dry-run |
Show what would be indexed | false |
--json |
JSON output format | false |
--no-rerank |
Skip ColBERT reranking | false |
--plain |
Disable ANSI colors | false |
Examples:
# General concept search
smgrep "API rate limiting logic"
# Deep dive (more matches per file)
smgrep "error handling" --per-file 5
# Just the file paths
smgrep "user validation" --compact
# JSON for scripting
smgrep "config parsing" --json
smgrep indexManually indexes the repository.
smgrep index # Index current dir
smgrep index --dry-run # See what would be indexed
smgrep index --reset # Delete and re-index from scratch
smgrep serveRuns a background daemon with file watching for instant searches.
smgrep serve # Start daemon for current repo
smgrep serve --path /repo # Start for specific path
smgrep stop / smgrep stop-allStop running daemons.
smgrep stop # Stop daemon for current repo
smgrep stop-all # Stop all smgrep daemons
smgrep cleanRemove index data and metadata for a store.
smgrep clean # Clean current directory's store
smgrep clean my-store # Clean specific store by ID
smgrep clean --all # Clean all stores
smgrep statusShow status of running daemons.
smgrep listLists all indexed repositories and their metadata.
smgrep doctorChecks installation health, model availability, and grammar status.
smgrep doctor
smgrep uses candle for ML inference with optional CUDA support.
With CUDA (default):
Requires CUDA toolkit installed with environment configured:
export CUDA_ROOT=/usr/local/cuda # or your CUDA installation path
export PATH="$CUDA_ROOT/bin:$PATH"
cargo build --release
Embedding speed is significantly faster on NVIDIA GPUs.
CPU-only:
cargo build --release --no-default-features
Environment variables:
SMGREP_DISABLE_GPU=1 - Force CPU even when CUDA is availableSMGREP_BATCH_SIZE=N - Override batch size (auto-adapts on OOM)smgrep combines several techniques for high-quality semantic search:
Smart Chunking: Tree-sitter parses code by function/class boundaries, ensuring embeddings capture complete logical blocks. Grammars download on-demand as WASM modules.
Hybrid Search: Dense embeddings (sentence-transformers) for broad recall, ColBERT reranking for precision.
Quantized Storage: ColBERT embeddings are quantized to int8 for efficient storage in LanceDB.
Automatic Repository Isolation: Stores are named by git remote URL or directory hash.
Incremental Indexing: File watcher detects changes and updates only affected chunks.
Supported languages (37): TypeScript, TSX, JavaScript, Python, Go, Rust, C, C++, C#, Java, Kotlin, Scala, Ruby, PHP, Elixir, Haskell, OCaml, Julia, Zig, Lua, Odin, Objective-C, Verilog, HTML, CSS, XML, Markdown, JSON, YAML, TOML, Bash, Make, Starlark, HCL, Terraform, Diff, Regex
smgrep uses a TOML config file at ~/.smgrep/config.toml. All options can also be set via environment variables with the SMGREP_ prefix.
# ~/.smgrep/config.toml
# ============================================================================
# Models
# ============================================================================
# Dense embedding model (HuggingFace model ID)
# Used for initial semantic similarity search
dense_model = "ibm-granite/granite-embedding-small-english-r2"
# ColBERT reranking model (HuggingFace model ID)
# Used for precise reranking of search results
colbert_model = "answerdotai/answerai-colbert-small-v1"
# Model dimensions (must match the models above)
dense_dim = 384
colbert_dim = 96
# Query prefix (some models require a prefix like "query: ")
query_prefix = ""
# Maximum sequence lengths for tokenization
dense_max_length = 256
colbert_max_length = 256
# ============================================================================
# Performance
# ============================================================================
# Batch size for embedding computation
# Higher = faster but more memory. Auto-reduces on OOM.
default_batch_size = 48
max_batch_size = 96
# Maximum threads for parallel processing
max_threads = 32
# Force CPU inference even when CUDA is available
disable_gpu = false
# Low-impact mode: reduces resource usage for background indexing
low_impact = false
# Fast mode: skip ColBERT reranking for quicker (but less precise) results
fast_mode = false
# ============================================================================
# Server
# ============================================================================
# TCP port for daemon communication
port = 4444
# Idle timeout: shutdown daemon after this many seconds of inactivity
idle_timeout_secs = 1800 # 30 minutes
# How often to check for idle timeout
idle_check_interval_secs = 60
# Timeout for embedding worker operations (milliseconds)
worker_timeout_ms = 60000
# ============================================================================
# Debug
# ============================================================================
# Enable model loading debug output
debug_models = false
# Enable embedding debug output
debug_embed = false
# Enable profiling
profile_enabled = false
# Skip saving metadata (for testing)
skip_meta_save = false
Any config option can be set via environment variable with the SMGREP_ prefix:
# Examples
export SMGREP_DISABLE_GPU=true
export SMGREP_DEFAULT_BATCH_SIZE=24
export SMGREP_IDLE_TIMEOUT_SECS=3600
| Variable | Description | Default |
|---|---|---|
SMGREP_STORE |
Override store name | auto-detected |
SMGREP_DISABLE_GPU |
Force CPU inference | false |
SMGREP_DEFAULT_BATCH_SIZE |
Embedding batch size | 48 |
SMGREP_LOW_IMPACT |
Reduce resource usage | false |
SMGREP_FAST_MODE |
Skip reranking | false |
smgrep respects .gitignore and .smignore files.
Create .smignore in your repository root:
# Ignore generated files
dist/
*.min.js
# Ignore test fixtures
test/fixtures/
smgrep listsmgrep --store custom-name "query"~/.smgrep/smgrep index to refresh.smgrep doctor to verify models and grammars.smgrep index --reset or delete ~/.smgrep/.SMGREP_DISABLE_GPU=1.git clone https://github.com/can1357/smgrep
cd smgrep
cargo build --release
# Run tests
cargo test
smgrep is inspired by osgrep and mgrep by MixedBread.
Licensed under the Apache License, Version 2.0. See LICENSE for details.