# Kalosm Sound Kalosm Sound is a collection of audio models and utilities for the Kalosm framework. It supports several [voice activity detection models](crate::VoiceActivityDetectorExt), and provides utilities for [transcribing audio into text](crate::AsyncSourceTranscribeExt). ## Sound Streams Models in kalosm sound work with any [`AsyncSource`]. You can use [`MicInput::stream`] to stream audio from the microphone, or any synchronous audio source that implements [`rodio::Source`] like a mp3 or wav file. You can transform the audio streams with: - [`VoiceActivityDetectorExt::voice_activity_stream`]: Detect voice activity in the audio data - [`DenoisedExt::denoise_and_detect_voice_activity`]: Denoise the audio data and detect voice activity - [`AsyncSourceTranscribeExt::transcribe`]: Chunk an audio stream based on voice activity and then transcribe the chunked audio data - [`VoiceActivityStreamExt::rechunk_voice_activity`]: Chunk an audio stream based on voice activity - [`VoiceActivityStreamExt::filter_voice_activity`]: Filter chunks of audio data based on voice activity - [`TranscribeChunkedAudioStreamExt::transcribe`]: Transcribe a chunked audio stream ## Voice Activity Detection VAD models are used to detect when a speaker is speaking in a given audio stream. The simplest way to use a VAD model is to create an audio stream and call [`VoiceActivityDetectorExt::voice_activity_stream`] to stream audio chunks that are actively being spoken: ```rust, no_run use kalosm::sound::*; #[tokio::main] async fn main() { // Get the default microphone input let mic = MicInput::default(); // Stream the audio from the microphone let stream = mic.stream().unwrap(); // Detect voice activity in the audio stream let mut vad = stream.voice_activity_stream(); while let Some(input) = vad.next().await { println!("Probability: {}", input.probability); } } ``` Kalosm also provides [`VoiceActivityStreamExt::rechunk_voice_activity`] to collect chunks of consecutive audio samples with a high vad probability. This can be useful for applications like speech recognition where context between consecutive audio samples is important. ```rust, no_run use kalosm::sound::*; use rodio::Source; #[tokio::main] async fn main() { // Get the default microphone input let mic = MicInput::default(); // Stream the audio from the microphone let stream = mic.stream().unwrap(); // Chunk the audio into chunks of speech let vad = stream.voice_activity_stream(); let mut audio_chunks = vad.rechunk_voice_activity(); // Print the chunks as they are streamed in while let Some(input) = audio_chunks.next().await { println!("New voice activity chunk with duration {:?}", input.total_duration()); } } ``` ## Transcription You can use the [`Whisper`] model to transcribe audio into text. Kalosm can transcribe any [`AsyncSource`] into a transcription stream with the [`AsyncSourceTranscribeExt::transcribe`] method: ```rust, no_run use kalosm::sound::*; #[tokio::main] async fn main() { // Get the default microphone input let mic = MicInput::default(); // Stream the audio from the microphone let stream = mic.stream().unwrap(); // Transcribe the audio into text with the default Whisper model let mut transcribe = stream.transcribe(Whisper::new().await.unwrap()); // Print the text as it is streamed in transcribe.to_std_out().await.unwrap(); } ```