livespeech-sdk

Crates.iolivespeech-sdk
lib.rslivespeech-sdk
version0.1.13
created_at2025-12-19 01:39:49.105191+00
updated_at2026-01-20 13:03:40.56336+00
descriptionReal-time speech-to-speech AI conversation SDK
homepage
repositoryhttps://github.com/DrawDream-incorporated/LiveSpeechSDK
max_upload_size
id1993985
size129,525
(Y-JayKim)

documentation

https://docs.rs/livespeech-sdk

README

LiveSpeech SDK for Rust

Crates.io Documentation License: MIT

A Rust SDK for real-time speech-to-speech AI conversations.

Features

  • đŸŽ™ī¸ Real-time Voice Conversations - Natural, low-latency voice interactions
  • 🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
  • 🔊 Streaming Audio - Send and receive audio in real-time
  • âšī¸ Barge-in Support - Interrupt AI mid-speech by talking or programmatically
  • 🔄 Auto-reconnection - Automatic recovery from network issues

Installation

Add to your Cargo.toml:

[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }

Quick Start (5 minutes)

use livespeech_sdk::{Config, LiveSpeechClient, LiveSpeechEvent, Region, SessionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create client
    let config = Config::builder()
        .region(Region::ApNortheast2)
        .api_key("your-api-key")
        .build()?;
    let client = LiveSpeechClient::new(config);

    // 2. Handle events (only 4 essential events!)
    let mut events = client.subscribe();
    tokio::spawn(async move {
        while let Ok(event) = events.recv().await {
            match event {
                // Play AI audio
                LiveSpeechEvent::Audio(e) => {
                    audio_player.queue(&e.data);  // PCM16 @ 24kHz
                }
                // User interrupted - CLEAR BUFFER!
                LiveSpeechEvent::Interrupted(_) => {
                    audio_player.clear();
                }
                // AI finished speaking
                LiveSpeechEvent::TurnComplete(_) => {
                    println!("AI finished");
                }
                // Handle errors
                LiveSpeechEvent::Error(e) => {
                    eprintln!("Error: {}", e.message);
                }
                _ => {}
            }
        }
    });

    // 3. Connect and start
    client.connect().await?;
    client.start_session(Some(SessionConfig::new("You are a helpful assistant."))).await?;

    // 4. Send audio
    client.audio_start().await?;
    for chunk in audio_chunks {
        client.send_audio_chunk(&chunk).await?;  // PCM16 @ 16kHz
    }
    client.audio_end().await?;

    // 5. Cleanup
    client.end_session().await?;
    client.disconnect().await;
    Ok(())
}

Core API

Everything you need for basic voice conversations.

Methods

Method Description
connect() Establish connection
disconnect() Close connection
start_session(config) Start conversation with system prompt
end_session() End conversation
send_audio_chunk(data) Send PCM16 audio (16kHz)

Events

Event Description Action Required
Audio AI's audio output Play audio (PCM16 @ 24kHz)
TurnComplete AI finished speaking Ready for next input
Interrupted User barged in Clear audio buffer!
Error Error occurred Handle/log error

âš ī¸ Critical: Handle Interrupted

When the user speaks while AI is responding, you must clear your audio buffer:

LiveSpeechEvent::Interrupted(_) => {
    audio_player.clear();  // Stop buffered audio immediately
    audio_player.stop();
}

Without this, 2-3 seconds of buffered audio continues playing after the user interrupts.

Audio Format

Direction Format Sample Rate
Input (mic) PCM16 16,000 Hz
Output (AI) PCM16 24,000 Hz

Configuration

let config = Config::builder()
    .region(Region::ApNortheast2)    // Required
    .api_key("your-api-key")          // Required
    .build()?;

let session = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR");          // Optional: ko-KR, en-US, ja-JP, etc.

Advanced API

Optional features for power users.

Additional Methods

Method Description
audio_start() / audio_end() Manual audio stream control
interrupt() Explicitly stop AI response (for Stop button)
send_system_message(text) Inject context during conversation
send_tool_response(id, result) Reply to function calls
update_user_id(user_id) Migrate guest to authenticated user

Additional Events

Event Description
Connected / Disconnected Connection lifecycle
SessionStarted / SessionEnded Session lifecycle
Ready Session ready for audio
UserTranscript User's speech transcribed
Response AI's response text
ToolCall AI wants to call a function
UserIdUpdated Guest-to-user migration complete

Explicit Interrupt (Stop Button)

For UI "Stop" buttons or programmatic control:

// User clicks Stop button
client.interrupt().await?;

Note: Voice barge-in works automatically via Gemini's VAD. This method is for explicit control.


System Messages

Inject text context during live sessions (game events, app state, etc.):

// AI responds immediately
client.send_system_message("User completed level 5. Congratulate them!").await?;

// Context only, no response
client.send_system_message_with_options("User is browsing", false).await?;

Requires active live session (audio_start() called). Max 500 characters.


Function Calling (Tool Use)

Let AI call functions in your app:

1. Define Tools

let tools = vec![Tool {
    name: "get_price".to_string(),
    description: "Gets product price by ID".to_string(),
    parameters: Some(FunctionParameters {
        r#type: "OBJECT".to_string(),
        properties: serde_json::json!({
            "productId": { "type": "string" }
        }),
        required: vec!["productId".to_string()],
    }),
}];

let session = SessionConfig::new("You are helpful.")
    .with_tools(tools);

2. Handle ToolCall Events

LiveSpeechEvent::ToolCall(e) => {
    let result = match e.name.as_str() {
        "get_price" => {
            let price = lookup_price(&e.args["productId"]);
            serde_json::json!({ "price": price })
        }
        _ => serde_json::json!({ "error": "Unknown" })
    };
    client.send_tool_response(&e.id, result).await.ok();
}

Conversation Memory

Enable persistent memory across sessions:

let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    .user_id("user-123")  // Enables memory
    .build()?;
Mode Memory
With user_id Permanent (entities, summaries)
Without user_id Session only (guest)

Guest-to-User Migration

// User logs in during session
client.update_user_id("authenticated-user-123").await?;

// Listen for confirmation
LiveSpeechEvent::UserIdUpdated(e) => {
    println!("Migrated {} messages", e.migrated_messages);
}

AI Speaks First

AI initiates the conversation:

let session = SessionConfig::new("Greet the customer warmly.")
    .with_ai_speaks_first(true);

client.start_session(Some(session)).await?;
client.audio_start().await?;  // AI speaks immediately

Session Options

Option Default Description
prePrompt - System prompt
language "en-US" Language code
pipeline_mode Live Live (~300ms) or Composed (~1-2s)
ai_speaks_first false AI initiates (Live mode only)
allow_harm_category false Disable safety filters
tools [] Function definitions

Audio Utilities

use livespeech_sdk::{float32_to_int16, int16_to_bytes, wrap_pcm_in_wav};

let pcm = float32_to_int16(&float_samples);
let bytes = int16_to_bytes(&pcm);
let wav = wrap_pcm_in_wav(&bytes, 16000, 1, 16);

Error Handling

match client.connect().await {
    Ok(()) => println!("Connected"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timed out"),
    Err(LiveSpeechError::NotConnected) => eprintln!("Not connected"),
    Err(e) => eprintln!("Error: {}", e),
}

Regions

Region Code
Seoul (Korea) Region::ApNortheast2

License

MIT

Commit count: 0

cargo fmt