| Crates.io | ctranslate2-server |
| lib.rs | ctranslate2-server |
| version | 0.1.2 |
| created_at | 2026-01-13 12:15:34.331982+00 |
| updated_at | 2026-01-14 07:21:22.134275+00 |
| description | A high-performance inference server for CTranslate2 models, compatible with OpenAI's API. |
| homepage | |
| repository | https://github.com/any35/ctranslate2-server |
| max_upload_size | |
| id | 2040014 |
| size | 109,922 |
A high-performance, OpenAI-compatible HTTP server for CTranslate2 models, built with Rust and Axum.
/v1/chat/completions for text generation.ct2rs).nllb) to specific model folders.cargo install ctranslate2-server
Download your CTranslate2-converted models into a models/ directory.
Structure example:
models/
├── nllb-200-distilled-600M/
│ ├── model.bin
│ ├── sentencepiece.model
│ └── shared_vocabulary.txt
└── t5-small/
├── model.bin
└── ...
Run the built-in config generator to scan your models/ directory and create a config.toml.
Using Docker:
docker run --rm -v $(pwd):/app -w /app any35/ctranslate2-server:latest /app/server --config-gen
# Note: You might need to adjust permissions or run the generator locally if using the provided binary directly.
# Alternatively, copy the example config below.
Using Cargo (Local):
cargo run --bin config_generator
Create a docker-compose.yml (see Docker Compose section) and run:
docker-compose up -d
Generate Text (NLLB Translation):
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nllb",
"messages": [
{"role": "user", "content": "Hello world"}
],
"target_lang": "zho_Hans"
}'
config.toml)# Global Defaults
default_model = "nllb"
target_lang = "eng_Latn"
device = "cpu" # "cpu" or "cuda"
device_indices = [0] # GPU IDs
beam_size = 5
repetition_penalty = 1.2
[server]
host = "0.0.0.0"
port = 8080
[aliases]
"nllb" = "nllb-200-distilled-600M"
[models]
[models."nllb-200-distilled-600M"]
path = "./models/nllb-200-distilled-600M"
model_type = "nllb"
target_lang = "fra_Latn" # Per-model default
/v1/chat/completionsParameters:
model: (string) Model alias or directory name.messages: (array) List of messages. Last user message is used as prompt.target_lang: (string, optional) Target language code (e.g., fra_Latn, zho_Hans). Overrides config.beam_size: (int, optional) Beam size for search (default: 5).repetition_penalty: (float, optional) Penalty for repeated tokens (default: 1.2).no_repeat_ngram_size: (int, optional) Prevent repeating n-grams of this size../scripts/build_docker.sh
docker run -p 8080:8080 -v $(pwd)/models:/app/models -v $(pwd)/config.toml:/app/config.toml any35/ctranslate2-server:cpu
Requires NVIDIA Container Toolkit.
docker run --gpus all -p 8080:8080 -v $(pwd)/models:/app/models -v $(pwd)/config.toml:/app/config.toml any35/ctranslate2-server:gpu
# for NLLB 3.3B, quantization with int8_float16
ct2-transformers-converter --model ./ --output_dir ./quant_16 --quantization int8_float16
# for NLLB 1.3B, quantization with int8
ct2-transformers-converter --model ./ --output_dir ./quant_8 --quantization int8