| Crates.io | shimmy |
| lib.rs | shimmy |
| version | 1.4.2 |
| created_at | 2025-09-04 19:16:09.102735+00 |
| updated_at | 2025-09-23 02:54:19.384287+00 |
| description | Lightweight sub-20MB Ollama alternative with native SafeTensors support. No Python dependencies, 2x faster loading. Now with GitHub Spec-Kit integration for systematic development. |
| homepage | https://github.com/Michael-A-Kuykendall/shimmy |
| repository | https://github.com/Michael-A-Kuykendall/shimmy |
| max_upload_size | |
| id | 1824656 |
| size | 6,205,524 |
Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.
π If Shimmy helps you, consider sponsoring β 100% of support goes to keeping it free forever.
π― Become a Sponsor | See our amazing sponsors π
Shimmy is a 5.1MB single-binary that provides 100% OpenAI-compatible endpoints for GGUF models. Point your existing AI tools to Shimmy and they just work β locally, privately, and free.
New developer tools and specifications included! Whether you're forking Shimmy for your application or integrating it as a service, we now provide:
Building something cool with Shimmy? These tools help you do it systematically and reliably.
Shimmy now includes GitHub's brand-new Spec-Kit methodology β specification-driven development that just launched in September 2025! Get professional-grade development workflows:
/specify β /plan β /tasks β implementπ Complete Developer Guide β β’ π οΈ Learn GitHub Spec-Kit β
# 1) Install + run
cargo install shimmy --features huggingface
shimmy serve &
# 2) See models and pick one
shimmy list
# 3) Smoke test the OpenAI API
curl -s http://127.0.0.1:11435/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model":"REPLACE_WITH_MODEL_FROM_list",
"messages":[{"role":"user","content":"Say hi in 5 words."}],
"max_tokens":32
}' | jq -r '.choices[0].message.content'
No code changes needed - just change the API endpoint:
http://localhost:11435import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://127.0.0.1:11435/v1",
apiKey: "sk-local", // placeholder, Shimmy ignores it
});
const resp = await openai.chat.completions.create({
model: "REPLACE_WITH_MODEL",
messages: [{ role: "user", content: "Say hi in 5 words." }],
max_tokens: 32,
});
console.log(resp.choices[0].message?.content);
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:11435/v1", api_key="sk-local")
resp = client.chat.completions.create(
model="REPLACE_WITH_MODEL",
messages=[{"role": "user", "content": "Say hi in 5 words."}],
max_tokens=32,
)
print(resp.choices[0].message.content)
# RECOMMENDED: Use pre-built binary (no build dependencies required)
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy.exe -o shimmy.exe
# OR: Install from source (requires LLVM/Clang)
# First install build dependencies:
winget install LLVM.LLVM
# Then install shimmy:
cargo install shimmy --features huggingface
β οΈ Windows Notes:
- Pre-built binary recommended to avoid build dependency issues
- If Windows Defender flags the binary, add an exclusion or use
cargo install- For
cargo install: Install LLVM first to resolvelibclang.dllerrors
# Install from crates.io
cargo install shimmy --features huggingface
Shimmy supports multiple GPU backends for accelerated inference:
| Backend | Hardware | Installation |
|---|---|---|
| CUDA | NVIDIA GPUs | cargo install shimmy --features llama-cuda |
| Vulkan | Cross-platform GPUs | cargo install shimmy --features llama-vulkan |
| OpenCL | AMD/Intel/Others | cargo install shimmy --features llama-opencl |
| MLX | Apple Silicon | cargo install shimmy --features mlx |
| All GPUs | Everything | cargo install shimmy --features gpu |
# Show detected GPU backends
shimmy gpu-info
--gpu-backend <backend> to force specific backendShimmy auto-discovers models from:
~/.cache/huggingface/hub/~/.ollama/models/./models/SHIMMY_BASE_GGUF=path/to/model.gguf# Download models that work out of the box
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf --local-dir ./models/
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF --local-dir ./models/
# Auto-allocates port to avoid conflicts
shimmy serve
# Or use manual port
shimmy serve --bind 127.0.0.1:11435
Point your AI tools to the displayed port β VSCode Copilot, Cursor, Continue.dev all work instantly.
cargo install shimmynpm install -g shimmy-js (coming soon)pip install shimmy (coming soon)docker pull shimmy/shimmy:latest (coming soon)Full compatibility confirmed! Shimmy works flawlessly on macOS with Metal GPU acceleration.
# Install dependencies
brew install cmake rust
# Install shimmy
cargo install shimmy
β Verified working:
{
"github.copilot.advanced": {
"serverUrl": "http://localhost:11435"
}
}
{
"models": [{
"title": "Local Shimmy",
"provider": "openai",
"model": "your-model-name",
"apiBase": "http://localhost:11435/v1"
}]
}
Works out of the box - just point to http://localhost:11435/v1
I built Shimmy to retain privacy-first control on my AI development and keep things local and lean.
This is my commitment: Shimmy stays MIT licensed, forever. If you want to support development, sponsor it. If you don't, just build something cool with it.
π‘ Shimmy saves you time and money. If it's useful, consider sponsoring for $5/month β less than your Netflix subscription, infinitely more useful for developers.
GET /health - Health checkPOST /v1/chat/completions - OpenAI-compatible chatGET /v1/models - List available modelsPOST /api/generate - Shimmy native APIGET /ws/generate - WebSocket streamingshimmy serve # Start server (auto port allocation)
shimmy serve --bind 127.0.0.1:8080 # Manual port binding
shimmy list # Show available models
shimmy discover # Refresh model discovery
shimmy generate --name X --prompt "Hi" # Test generation
shimmy probe model-name # Verify model loads
π¦ Sub-20MB single binary (34x smaller than Ollama)
π stars and climbing fast
β± <1s startup
π¦ 100% Rust, no Python
π₯ Hacker News β’ Front Page Again β’ IPE Newsletter
Companies: Need invoicing? Email michaelallenkuykendall@gmail.com
| Tool | Binary Size | Startup Time | Memory Usage | OpenAI API |
|---|---|---|---|---|
| Shimmy | 10-20MB | <100ms | 50MB | 100% |
| Ollama | 680MB | 5-10s | 200MB+ | Partial |
| llama.cpp | 89MB | 1-2s | 100MB | None |
Shimmy maintains high code quality through comprehensive testing:
See our testing approach for technical details.
MIT License - forever and always.
Philosophy: Infrastructure should be invisible. Shimmy is infrastructure.
Testing Philosophy: Reliability through comprehensive validation and property-based testing.
Forever maintainer: Michael A. Kuykendall
Promise: This will never become a paid product
Mission: Making local AI development frictionless