| Crates.io | active-call |
| lib.rs | active-call |
| version | 0.3.31 |
| created_at | 2025-12-22 12:13:57.710042+00 |
| updated_at | 2026-01-24 12:33:33.954602+00 |
| description | A SIP/WebRTC voice agent |
| homepage | |
| repository | https://github.com/restsend/active-call |
| max_upload_size | |
| id | 1999628 |
| size | 8,440,532 |
active-call is a standalone Rust crate designed for building AI Voice Agents. It provides a high-performance infrastructure for bridging AI models with real-word telephony and web communications.
For comprehensive guides and tutorials, visit:
active-call supports a wide range of communication protocols, ensuring compatibility with both legacy and modern systems:
Choose the pipeline that fits your latency and cost requirements:
The Playbook system is our recommended way to build complex, stateful voice agents:
Scene switching).Enhance conversational fluidity with intelligent context handling:
Privacy-First & Cost-Effective: Run ASR and TTS locally without cloud APIs
We recommend using Docker to run active-call with offline models. The easiest way is to download the models first using a temporary container.
⚠️ Docker must have access to the internet to download models from HuggingFace.
⚠️ Docker must run with --net host on OS to ensure proper SIP/RTP functionality.
1. Download Models
# Create a local directory for models
mkdir -p $(pwd)/data/models
# Download all models (SenseVoice & Supertonic)
docker run --rm \
-v $(pwd)/data/models:/models \
ghcr.io/restsend/active-call:latest \
--download-models all --models-dir /models \
--exit-after-download
Note for users in Mainland China: If you experience slow downloads from HuggingFace, you can use the mirror site by setting the
HF_ENDPOINTenvironment variable:docker run --rm \ -e HF_ENDPOINT=https://hf-mirror.com \ -v $(pwd)/data/models:/models \ ghcr.io/restsend/active-call:latest \ --download-models all --models-dir /models
2. Run with Models Mounted
Once downloaded, mount the models directory when running the service:
docker run -d \
--net host \
-p 8080:8080 \
-p 13050:13050/udp \
-v $(pwd)/data/models:/app/models \
-v $(pwd)/config:/app/config \
ghcr.io/restsend/active-call:latest
# In your playbook .md file
---
asr:
provider: "sensevoice" # Offline ASR
language: "zh" # auto, zh, en, ja, ko, yue
tts:
provider: "supertonic" # Offline TTS
speaker: "M1" # M1, M2, F1, F2
speed: 1.0
language: "en" # en, ko, es, pt, fr
vad:
provider: "silero" # Uses built-in TinySilero
---
Benefits:
For developers who need full control, active-call provides a raw audio-over-websocket interface. This is ideal for custom web apps or integration with existing AI pipelines where you want to handle the audio stream manually.
active-call can be integrated into existing corporate telephony:
active-call to your RustPBX, FreeSWITCH or Asterisk PBX like any other VoIP phone. AI agents can then receive calls from internal extensions or external trunks.active-call provides flexible handling of incoming SIP invitations through configurable handlers:
For rapid setup without editing config files, use CLI parameters:
# Webhook handler
./active-call --handler https://example.com/webhook
# Playbook handler (default playbook)
./active-call --handler default.md
# Make an outgoing SIP call using a playbook
./active-call --call sip:1001@127.0.0.1:5060 --handler greeting.md
# Set external IP and supported codecs
./active-call --handler default.md --external-ip 1.2.3.4 --codecs pcmu,pcma,opus
The handler type is automatically detected:
http:// or https:// become Webhook handlers.md become Playbook handlers (set as default)Additional CLI options:
--call <SIP_URI>: Initiate an outgoing SIP call immediately--external-ip <IP>: Set external IP address for SIP/RTP--codecs <CODECS>: Comma-separated list of supported codecs (pcmu,pcma,g722,g729,opus,telephone_event)Forward incoming SIP invitations to an HTTP endpoint for custom call routing logic:
[handler]
type = "webhook"
url = "http://localhost:8090/webhook"
method = "POST"
Automatically route calls to specific playbooks based on caller/callee patterns using regex matching:
[handler]
type = "playbook"
default = "default.md" # Optional: fallback when no rules match
[[handler.rules]]
caller = "^\\+1\\d{10}$" # Match US numbers
callee = "^sip:support@.*" # Match support line
playbook = "support.md" # Use support playbook
[[handler.rules]]
caller = "^\\+86\\d+" # Match Chinese numbers
playbook = "chinese.md" # Language-specific playbook
[[handler.rules]]
callee = "^sip:sales@.*" # Match sales line
playbook = "sales.md" # Use sales playbook
How it works:
caller and/or callee regex patternsconfig/playbook/ directoryUse cases:
Here's a complete example of setting up active-call to handle incoming SIP calls with playbooks:
Create config/playbook/greeting.md:
---
asr:
provider: "sensevoice"
tts:
provider: "supertonic"
speaker: "F1"
llm:
provider: "openai"
model: "gpt-4o-mini"
---
# Scene: greeting
You are a friendly AI assistant. Greet the caller warmly and ask how you can help them today.
Option A: Using CLI (Quick Start):
# Start with playbook handler
./active-call --handler config/playbook/greeting.md --sip 0.0.0.0:5060 --external-ip your-server-ip
Option B: Using Configuration File (config.toml):
[handler]
type = "playbook"
default = "greeting.md"
# Optional: Add routing rules
[[handler.rules]]
callee = "^sip:sales@.*"
playbook = "sales.md"
[[handler.rules]]
callee = "^sip:support@.*"
playbook = "support.md"
Using any SIP client (like linphone, zoiper, or sipbot):
# Using pjsua (PJSIP command line tool)
pjsua --use-ice --null-audio sip:agent@your-server-ip:5060
# Or using sipbot (for testing)
sipbot -target sip:agent@your-server-ip:5060 -duration 30
active-call receives the SIP INVITECheck logs to see the call flow:
RUST_LOG=info ./active-call --handler greeting.md
# You'll see:
# INFO active_call::useragent::playbook_handler: matched playbook for invite
# INFO active_call::handler::handler: Playbook runner started for greeting.md
# INFO active_call::handler::handler: new call started
If the playbook file doesn't exist:
503 Service Unavailable response to the callerPlaybook file not found, rejecting SIP callThis prevents accepting calls that can't be properly handled.

Benchmarked on 60 seconds of 16kHz audio (Release mode):
| VAD Engine | Implementation | Time (60s) | RTF (Ratio) | Note |
|---|---|---|---|---|
| TinySilero | Rust (Optimized) | ~60.0 ms | 0.0010 | >2.5x faster than ONNX |
| ONNX Silero | ONNX Runtime | ~158.3 ms | 0.0026 | Standard baseline |
| WebRTC VAD | C/C++ (Bind) | ~3.1 ms | 0.00005 | Legacy, less accurate |
For detailed information on REST endpoints and WebSocket protocols, please refer to the API Documentation.
active-call.active-call provides a rich set of features for building natural and responsive voice agents:
Scene Management (Playbook):
# Scene: name headers.{{ variable }} syntax using minijinja for personalized prompts..wav/.pcm) by adding <play file="path/to/audio.wav" /> at the beginning of a scene.short: One or two sentence summary.detailed: Comprehensive summary with key points and decisions.intent: Extracted user intent.json: Structured JSON summary.custom: Fully custom summary prompt.intent_clarification: Automatically guides the LLM to ask for clarification when user intent is ambiguous.emotion_resonance: Enables the LLM to sense and respond to user emotions (anger, joy, etc.) with empathy.voice_emotion: Instructs the LLM to generate emotion tags (e.g., [happy]) to drive expressive TTS.language setting (zh or en).Advanced Voice Interaction:
vad, asr, or both), filler word filtering, and protection periods.Playbooks are defined in Markdown files with YAML frontmatter. The frontmatter configures the voice capabilities, while the body defines the agent's persona and instructions.
---
asr:
provider: "sensevoice"
tts:
provider: "supertonic"
speaker: "F1"
llm:
provider: "openai"
model: "gpt-4-turbo"
language: "zh"
features: ["intent_clarification", "emotion_resonance"]
dtmf:
"0": { action: "hangup" }
followup:
timeout: 10000
max: 3
posthook:
url: "https://api.example.com/webhook"
summary: "detailed"
includeHistory: true
---
# Scene: greeting
<dtmf digit="1" action="goto" scene="tech_support" />
You are a friendly AI for {{ company_name }}.
Greet the caller and ask if they need technical support (Press 1) or billing help.
# Scene: tech_support
You are now in tech support. How can I help with your system?
To speak to a human, I can transfer you: <refer to="sip:human@domain.com" />
| Section | Field | Description |
|---|---|---|
| asr | provider |
Provider name (e.g., aliyun, openai, tencent). |
| tts | provider |
Provider name. |
| llm | provider |
LLM Provider. |
model |
Model name (e.g., gpt-4, qwen-plus). |
|
language |
Language code for built-in snippets (zh or en). |
|
features |
List of enabled feature snippets (e.g., ["intent_clarification", "voice_emotion"]). |
|
| dtmf | digit |
Mapping of keys (0-9, *, #) to actions. |
action |
goto (scene), transfer (SIP), or hangup. |
|
| interruption | strategy |
both, vad, asr, or none. |
fillerWordFilter |
Enable filler word filtering (true/false). | |
| vad | provider |
VAD provider (e.g., silero). |
| realtime | provider |
openai or azure. Enable low-latency streaming pipeline. |
model |
Specific realtime model (e.g., gpt-4o-realtime). |
|
| ambiance | path |
Path to background audio file. |
duckLevel |
Volume level when agent is speaking (0.0-1.0). | |
| followup | timeout |
Silence timeout in ms before triggering follow-up. |
max |
Maximum number of follow-up attempts. | |
| recorder | recorderFile |
Path template for recording (e.g., call_{id}.wav). |
| denoise | - | Enable/Disable noise suppression (true/false). |
| greeting | - | Initial greeting message. |
docker pull ghcr.io/restsend/active-call:latest
Copy the example config and customize it:
cp active-call.example.toml config.toml
docker run -d \
--net host \
--name active-call \
-v $(pwd)/config.toml:/app/config.toml:ro \
-v $(pwd)/config:/app/config \
-v $(pwd)/models:/app/models \
ghcr.io/restsend/active-call:latest
You can override configuration with CLI parameters:
docker run -d \
--net host \
--name active-call \
-p 13050:13050/udp \
-v $(pwd)/config:/app/config \
ghcr.io/restsend/active-call:latest \
--handler https://api.example.com/webhook
Supported CLI options:
--conf <path>: Path to config file--http <addr:port>: HTTP server address--sip <addr:port>: SIP server address--handler <url|file.md>: Quick handler setup (webhook URL or playbook file)Active-call supports various environment variables for configuration. If you have API keys or credentials, save them in an .env file:
# OpenAI/Azure OpenAI
OPENAI_API_KEY=sk-...
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# Aliyun (DashScope)
DASHSCOPE_API_KEY=sk-...
# Tencent Cloud
TENCENT_APPID=your_app_id
TENCENT_SECRET_ID=your_secret_id
TENCENT_SECRET_KEY=your_secret_key
When llm.provider is specified in playbooks, active-call automatically reads default API keys from environment:
OPENAI_API_KEYAZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINTDASHSCOPE_API_KEYTENCENT_APPID, TENCENT_SECRET_ID, TENCENT_SECRET_KEY# Model directory (default: ./models)
OFFLINE_MODELS_DIR=/path/to/models
# HuggingFace Hub settings (optional)
HF_ENDPOINT=https://hf-mirror.com # Use mirror if needed
HF_MIRROR=https://hf-mirror.com
Mount your .env file when running containers:
docker run -d \
--net host \
--name active-call \
-v $(pwd)/.env:/app/.env \
-v $(pwd)/config:/app/config \
-v $(pwd)/models:/app/models \
active-call:latest
This project is licensed under the MIT License.