| Crates.io | encoderfile |
| lib.rs | encoderfile |
| version | 0.4.0-rc.1 |
| created_at | 2025-11-20 20:06:10.194633+00 |
| updated_at | 2026-01-24 16:44:22.288874+00 |
| description | Distribute and run transformer encoders with a single file. |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1942532 |
| size | 446,020 |
Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.
While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.
Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.
Why?
Encoderfiles can run as:

Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):
| Task | Supported classes | Examples models |
|---|---|---|
| Embeddings / Feature Extraction | AutoModel, AutoModelForMaskedLM |
bert-base-uncased, distilbert-base-uncased |
| Sequence Classification | AutoModelForSequenceClassification |
distilbert-base-uncased-finetuned-sst-2-english, roberta-large-mnli |
| Token Classification | AutoModelForTokenClassification |
dslim/bert-base-NER, bert-base-cased-finetuned-conll03-english |
path/to/your/model/model.onnx).input_ids and optionally attention_mask.XLNet, Transfomer XL, and derivative architectures are not yet supported.Download the encoderfile CLI tool to build your own model binaries:
curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | sh
Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.
Move the binary to a location in your PATH:
# Linux/macOS
sudo mv encoderfile /usr/local/bin/
# Or add to your user bin
mkdir -p ~/.local/bin
mv encoderfile ~/.local/bin/
See our guide on building from source for detailed instructions on building the CLI tool from source.
Quick build:
cargo build --bin encoderfile --release
./target/release/encoderfile --help
First, you need an ONNX-exported model. Export any HuggingFace model:
# Install optimum for ONNX export
pip install optimum[exporters]
# Export a sentiment analysis model
optimum-cli export onnx \
--model distilbert-base-uncased-finetuned-sst-2-english \
--task text-classification \
./sentiment-model
Create sentiment-config.yml:
encoderfile:
name: sentiment-analyzer
path: ./sentiment-model
model_type: sequence_classification
output_path: ./build/sentiment-analyzer.encoderfile
Use the downloaded encoderfile CLI tool:
encoderfile build -f sentiment-config.yml
This creates a self-contained binary at ./build/sentiment-analyzer.encoderfile.
Start the server:
./build/sentiment-analyzer.encoderfile serve
The server will start on http://localhost:8080 by default.
Sentiment Analysis:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": [
"This is the cutest cat ever!",
"Boring video, waste of time",
"These cats are so funny!"
]
}'
Response:
{
"results": [
{
"logits": [0.00021549065, 0.9997845],
"scores": [0.00021549074, 0.9997845],
"predicted_index": 1,
"predicted_label": "POSITIVE"
},
{
"logits": [0.9998148, 0.00018516644],
"scores": [0.9998148, 0.0001851664],
"predicted_index": 0,
"predicted_label": "NEGATIVE"
},
{
"logits": [0.00014975034, 0.9998503],
"scores": [0.00014975043, 0.9998503],
"predicted_index": 1,
"predicted_label": "POSITIVE"
}
],
"model_id": "sentiment-analyzer"
}
Embeddings:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Hello world"],
"normalize": true
}'
Token Classification (NER):
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": ["Apple Inc. is located in Cupertino, California"]
}'
Start an HTTP server (default port 8080):
./my-model.encoderfile serve
Custom configuration:
./my-model.encoderfile serve \
--http-port 3000 \
--http-hostname 0.0.0.0
Disable gRPC (HTTP only):
./my-model.encoderfile serve --disable-grpc
Start with default gRPC server (port 50051):
./my-model.encoderfile serve
gRPC only (no HTTP):
./my-model.encoderfile serve --disable-http
Custom gRPC configuration:
./my-model.encoderfile serve \
--grpc-port 50052 \
--grpc-hostname localhost
Run one-off inference without starting a server:
# Single input
./my-model.encoderfile infer "This is a test sentence"
# Multiple inputs
./my-model.encoderfile infer "First text" "Second text" "Third text"
# Save output to file
./my-model.encoderfile infer "Test input" -o results.json
Run as a Model Context Protocol server:
./my-model.encoderfile mcp --hostname 0.0.0.0 --port 9100
# Custom HTTP port
./my-model.encoderfile serve --http-port 3000
# Custom gRPC port
./my-model.encoderfile serve --grpc-port 50052
# Both
./my-model.encoderfile serve --http-port 3000 --grpc-port 50052
./my-model.encoderfile serve \
--http-hostname 127.0.0.1 \
--grpc-hostname localhost
# HTTP only
./my-model.encoderfile serve --disable-grpc
# gRPC only
./my-model.encoderfile serve --disable-http
Once you have the encoderfile CLI tool installed, you can build binaries from any compatible HuggingFace model.
See our guide on building from source for detailed instructions including:
Quick workflow:
optimum-cli export onnx ...config.ymlencoderfile build -f config.yml./build/my-model.encoderfile serveWe welcome contributions! See CONTRIBUTING.md for guidelines.
# Clone the repository
git clone https://github.com/mozilla-ai/encoderfile.git
cd encoderfile
# Set up development environment
make setup
# Run tests
make test
# Build documentation
make docs
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.