adk-realtime

Crates.ioadk-realtime
lib.rsadk-realtime
version0.2.1
created_at2025-12-08 09:55:12.937089+00
updated_at2026-01-22 03:45:36.661994+00
descriptionReal-time bidirectional audio/video streaming for Rust Agent Development Kit (ADK-Rust) agents
homepage
repositoryhttps://github.com/zavora-ai/adk-rust
max_upload_size
id1973129
size208,276
James Karanja Maina (jkmaina)

documentation

https://docs.rs/adk-realtime

README

adk-realtime

Real-time bidirectional audio streaming for Rust Agent Development Kit (ADK-Rust) agents.

Overview

adk-realtime provides a unified interface for building voice-enabled AI agents using real-time streaming APIs from various providers. It follows the OpenAI Agents SDK pattern with a separate, decoupled implementation that integrates seamlessly with the ADK agent ecosystem.

Features

  • RealtimeAgent: Implements adk_core::Agent with full callback/tool/instruction support
  • Multiple Providers: Support for OpenAI Realtime API and Gemini Live API
  • Audio Streaming: Bidirectional audio with PCM16, G711, and other formats
  • Voice Activity Detection: Server-side VAD for natural conversation flow
  • Tool Calling: Real-time function/tool execution during voice conversations
  • Agent Handoff: Transfer between agents using sub_agents

Architecture

              ┌─────────────────────────────────────────┐
              │              Agent Trait                │
              │  (name, description, run, sub_agents)   │
              └────────────────┬────────────────────────┘
                               │
       ┌───────────────────────┼───────────────────────┐
       │                       │                       │
┌──────▼──────┐      ┌─────────▼─────────┐   ┌─────────▼─────────┐
│  LlmAgent   │      │  RealtimeAgent    │   │  SequentialAgent  │
│ (text-based)│      │  (voice-based)    │   │   (workflow)      │
└─────────────┘      └───────────────────┘   └───────────────────┘

RealtimeAgent shares the same features as LlmAgent:

  • Static and dynamic instructions (instruction, instruction_provider)
  • Tool registration and execution
  • Callbacks (before_agent, after_agent, before_tool, after_tool)
  • Sub-agent handoffs via transfer_to_agent

Supported Providers

Provider Model Feature Flag Description
OpenAI gpt-4o-realtime-preview-2024-12-17 openai Stable realtime model
OpenAI gpt-realtime openai Latest model with improved speech & function calling
Google gemini-2.0-flash-live-preview-04-09 gemini Gemini Live API

Quick Start

Add to your Cargo.toml:

[dependencies]
adk-realtime = { version = "0.2.1", features = ["openai"] }

Using RealtimeAgent (Recommended)

use adk_realtime::{RealtimeAgent, openai::OpenAIRealtimeModel};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api_key = std::env::var("OPENAI_API_KEY")?;
    let model = Arc::new(OpenAIRealtimeModel::new(&api_key, "gpt-4o-realtime-preview-2024-12-17"));

    let agent = RealtimeAgent::builder("voice_assistant")
        .model(model)
        .instruction("You are a helpful voice assistant.")
        .voice("alloy")
        .server_vad()  // Enable server-side voice activity detection
        .build()?;

    // RealtimeAgent implements the Agent trait
    // Use with ADK runner or directly via agent.run(ctx)
    Ok(())
}

Using Low-Level Session API

use adk_realtime::{RealtimeModel, RealtimeConfig, ServerEvent};
use adk_realtime::openai::OpenAIRealtimeModel;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let model = OpenAIRealtimeModel::new(
        std::env::var("OPENAI_API_KEY")?,
        "gpt-4o-realtime-preview-2024-12-17",
    );

    let config = RealtimeConfig::default()
        .with_instruction("You are a helpful voice assistant.")
        .with_voice("alloy");

    let session = model.connect(config).await?;

    // Send text or audio
    session.send_text("Hello!").await?;
    session.create_response().await?;

    // Process events
    while let Some(event) = session.next_event().await {
        match event? {
            ServerEvent::AudioDelta { delta, .. } => {
                // Play audio (delta is base64-encoded PCM)
            }
            ServerEvent::TextDelta { delta, .. } => {
                print!("{}", delta);
            }
            ServerEvent::FunctionCallDone { name, arguments, call_id, .. } => {
                // Execute tool and send response
                let result = execute_tool(&name, &arguments);
                session.send_tool_response(ToolResponse {
                    call_id,
                    output: result,
                }).await?;
            }
            _ => {}
        }
    }

    Ok(())
}

RealtimeAgent Features

Shared with LlmAgent

Feature Description
instruction(str) Static system instruction
instruction_provider(fn) Dynamic instruction based on context
global_instruction(str) Global instruction (prepended)
tool(Arc<dyn Tool>) Register a tool
sub_agent(Arc<dyn Agent>) Register sub-agent for handoffs
before_agent_callback Called before agent runs
after_agent_callback Called after agent completes
before_tool_callback Called before tool execution
after_tool_callback Called after tool execution

Realtime-Specific

Feature Description
voice(str) Voice selection ("alloy", "coral", "sage", etc.)
server_vad() Enable server-side VAD with defaults
vad(VadConfig) Custom VAD configuration
modalities(vec) Output modalities (["text", "audio"])
on_audio(callback) Callback for audio output events
on_transcript(callback) Callback for transcript events
on_speech_started(callback) Callback when speech detected
on_speech_stopped(callback) Callback when speech ends

Event Types

Server Events

Event Description
SessionCreated Connection established
AudioDelta Audio chunk (base64 PCM)
TextDelta Text response chunk
TranscriptDelta Input audio transcript
FunctionCallDone Tool call request
ResponseDone Response completed
SpeechStarted VAD detected speech
SpeechStopped VAD detected silence
Error Error occurred

Client Events

Event Description
AudioAppend Send audio chunk
AudioCommit Commit audio buffer
ItemCreate Send text or tool response
ResponseCreate Request a response
ResponseCancel Interrupt response
SessionUpdate Update configuration

Audio Formats

Format Sample Rate Bits Channels Provider
PCM16 24000 Hz 16 Mono OpenAI
PCM16 16000 Hz 16 Mono Gemini (input)
PCM16 24000 Hz 16 Mono Gemini (output)
G711 u-law 8000 Hz 8 Mono OpenAI
G711 A-law 8000 Hz 8 Mono OpenAI

Voice Activity Detection

Server VAD (Recommended)

let agent = RealtimeAgent::builder("assistant")
    .model(model)
    .server_vad()  // Uses default settings
    .build()?;

Custom VAD

use adk_realtime::{VadConfig, VadMode};

let agent = RealtimeAgent::builder("assistant")
    .model(model)
    .vad(VadConfig {
        mode: VadMode::ServerVad,
        threshold: Some(0.5),
        prefix_padding_ms: Some(300),
        silence_duration_ms: Some(500),
        interrupt_response: Some(true),
        eagerness: None,
    })
    .build()?;

Agent Handoffs

let booking_agent = Arc::new(/* ... */);
let support_agent = Arc::new(/* ... */);

let agent = RealtimeAgent::builder("receptionist")
    .model(model)
    .instruction("You are a receptionist. Transfer to booking_agent for reservations.")
    .sub_agent(booking_agent)
    .sub_agent(support_agent)
    .build()?;

// Agent can now call transfer_to_agent("booking_agent") during conversation

Examples

Run the included examples to see realtime agents in action:

# Basic text-only realtime session
cargo run --example realtime_basic --features realtime-openai

# Voice assistant with server-side VAD
cargo run --example realtime_vad --features realtime-openai

# Tool calling during voice conversations
cargo run --example realtime_tools --features realtime-openai

# Multi-agent handoffs (receptionist routing to specialists)
cargo run --example realtime_handoff --features realtime-openai

Example Descriptions

Example Description
realtime_basic Simple text-based realtime session demonstrating connection and streaming
realtime_vad Voice assistant with Voice Activity Detection for natural conversations
realtime_tools Real-time tool calling (weather lookup) during conversations
realtime_handoff Multi-agent system with receptionist routing to booking, support, and sales agents

Feature Flags

Flag Description
openai Enable OpenAI Realtime API
gemini Enable Gemini Live API
full Enable all providers

License

Apache-2.0

Part of ADK-Rust

This crate is part of the ADK-Rust framework for building AI agents in Rust.

Commit count: 227

cargo fmt