| Crates.io | loqa-voice-dsp |
| lib.rs | loqa-voice-dsp |
| version | 0.5.0 |
| created_at | 2025-11-21 04:19:38.408332+00 |
| updated_at | 2025-12-13 00:02:17.135149+00 |
| description | Shared DSP library for voice analysis (pitch, formants, spectral features) |
| homepage | https://github.com/loqalabs/loqa |
| repository | https://github.com/loqalabs/loqa |
| max_upload_size | |
| id | 1943118 |
| size | 358,458 |
Shared DSP library for voice analysis, providing core digital signal processing functionality for both Loqa backend and VoiceFind mobile app.
VoiceAnalyzer API for streaming analysisAdd to your Podfile:
pod 'LoqaVoiceDSP', '~> 0.3.0'
Then run:
pod install
In Xcode:
https://github.com/loqalabs/loqa0.3.0 or laterOr add to Package.swift:
dependencies: [
.package(url: "https://github.com/loqalabs/loqa", from: "0.3.0")
]
Add to your Cargo.toml:
[dependencies]
loqa-voice-dsp = "0.3.0"
Pitch detection algorithms analyze buffers in frames. For best results:
Why this matters:
Frame-based analysis for long audio (v0.3.0+):
For buffers larger than 4096 samples, use the new VoiceAnalyzer API:
use loqa_voice_dsp::{VoiceAnalyzer, AnalysisConfig};
let config = AnalysisConfig::default()
.with_frame_size(2048)
.with_hop_size(1024); // 50% overlap
let mut analyzer = VoiceAnalyzer::new(config)?;
let results = analyzer.process_stream(&long_audio_buffer);
Legacy approach (v0.2.x):
fn analyze_long_buffer(buffer: &[f32], sample_rate: u32) -> Vec<PitchResult> {
const FRAME_SIZE: usize = 2048;
const HOP_SIZE: usize = 1024; // 50% overlap
let mut results = Vec::new();
for i in (0..buffer.len().saturating_sub(FRAME_SIZE)).step_by(HOP_SIZE) {
let frame = &buffer[i..i + FRAME_SIZE];
if let Ok(pitch) = detect_pitch(frame, sample_rate, 80.0, 400.0) {
results.push(pitch);
}
}
results
}
New in v0.3.0 - Stateful API:
use loqa_voice_dsp::{VoiceAnalyzer, AnalysisConfig, PitchAlgorithm};
let audio_samples: Vec<f32> = /* your audio data */;
// Create analyzer with pYIN algorithm
let config = AnalysisConfig::default()
.with_sample_rate(16000)
.with_frame_size(2048)
.with_algorithm(PitchAlgorithm::PYIN);
let mut analyzer = VoiceAnalyzer::new(config)?;
// Process single frame
let pitch = analyzer.process_frame(&audio_samples)?;
println!("Frequency: {} Hz", pitch.frequency);
println!("Confidence: {}", pitch.confidence);
println!("Voiced Probability: {}", pitch.voiced_probability);
// Or process a stream
let results = analyzer.process_stream(&long_audio_buffer);
for (i, pitch) in results.iter().enumerate() {
println!("Frame {}: {} Hz (conf: {})", i, pitch.frequency, pitch.confidence);
}
Legacy API (still supported):
use loqa_voice_dsp::{detect_pitch, extract_formants, compute_fft, calculate_hnr, calculate_h1h2};
let audio_samples: Vec<f32> = /* your audio data */;
let sample_rate = 16000;
// Pitch detection (single-shot)
let pitch = detect_pitch(&audio_samples, sample_rate, 80.0, 400.0)?;
println!("Frequency: {} Hz, Confidence: {}", pitch.frequency, pitch.confidence);
// Formant extraction
let formants = extract_formants(&audio_samples, sample_rate, 14)?;
println!("F1: {} Hz, F2: {} Hz", formants.f1, formants.f2);
// HNR (breathiness)
let hnr = calculate_hnr(&audio_samples, sample_rate, 75.0, 500.0)?;
println!("HNR: {} dB, Voiced: {}", hnr.hnr, hnr.is_voiced);
// H1-H2 (vocal weight)
let h1h2 = calculate_h1h2(&audio_samples, sample_rate, Some(pitch.frequency))?;
println!("H1-H2: {} dB", h1h2.h1h2);
// FFT
let fft_result = compute_fft(&audio_samples, sample_rate, 2048)?;
New in v0.3.0 - Stateful Analyzer:
// Create analyzer configuration
var config = loqa_analysis_config_default()
config.algorithm = 1 // 0=Auto, 1=PYIN, 2=YIN, 3=Autocorr
config.frame_size = 2048
config.sample_rate = 16000
// Create analyzer
let analyzer = loqa_voice_analyzer_new(config)
defer { loqa_voice_analyzer_free(analyzer) } // Always free
// Process single frame
let pitchResult = samples.withUnsafeBufferPointer { buffer in
loqa_voice_analyzer_process_frame(
analyzer,
buffer.baseAddress!,
buffer.count
)
}
if pitchResult.success {
print("Pitch: \(pitchResult.frequency)Hz")
print("Confidence: \(pitchResult.confidence)")
print("Voiced Probability: \(pitchResult.voiced_probability)")
}
// Or process stream
var results = [PitchResultFFI](repeating: PitchResultFFI(), count: 100)
let count = samples.withUnsafeBufferPointer { buffer in
results.withUnsafeMutableBufferPointer { resultsBuffer in
loqa_voice_analyzer_process_stream(
analyzer,
buffer.baseAddress!,
buffer.count,
resultsBuffer.baseAddress!,
100
)
}
}
print("Got \(count) pitch results")
Legacy API (still supported):
// Call C-compatible FFI functions
let samples: [Float] = /* your audio data */
// Pitch detection
let pitchResult = samples.withUnsafeBufferPointer { buffer in
loqa_detect_pitch(
buffer.baseAddress!,
buffer.count,
16000, // sample rate
80.0, // min freq
400.0 // max freq
)
}
if pitchResult.success {
print("Pitch: \(pitchResult.frequency)Hz, Confidence: \(pitchResult.confidence)")
}
// HNR (breathiness)
let hnrResult = samples.withUnsafeBufferPointer { buffer in
loqa_calculate_hnr(
buffer.baseAddress!,
buffer.count,
16000, // sample rate
75.0, // min freq
500.0 // max freq
)
}
if hnrResult.success {
print("HNR: \(hnrResult.hnr) dB, Voiced: \(hnrResult.is_voiced)")
}
// H1-H2 (vocal weight) - pass 0.0 for f0 to auto-detect
let h1h2Result = samples.withUnsafeBufferPointer { buffer in
loqa_calculate_h1h2(
buffer.baseAddress!,
buffer.count,
16000, // sample rate
pitchResult.frequency // use detected pitch, or 0.0 to auto-detect
)
}
if h1h2Result.success {
print("H1-H2: \(h1h2Result.h1h2) dB")
}
// Build with --features android-jni
import com.voicefind.VoiceFindDSP;
float[] audioSamples = /* your audio data */;
VoiceFindDSP.PitchResult pitch = VoiceFindDSP.detectPitch(
audioSamples,
16000, // sample rate
80.0f, // min freq
400.0f // max freq
);
System.out.println("Frequency: " + pitch.frequency + " Hz");
Note: Android JNI requires building with --features android-jni
Critical: All FFI structs use #[repr(C)] to ensure C-compatible memory layout. Failure to maintain this can cause alignment issues and incorrect values (see historical issues #1, #2, #3).
Memory safety:
loqa_compute_fft) allocate memory that must be freed using loqa_free_fft_resultloqa_free_fft_resultSwift/iOS example with proper cleanup:
let fftResult = loqa_compute_fft(buffer, count, sampleRate, fftSize)
defer { loqa_free_fft_result(&fftResult) } // Always free
if fftResult.success {
let spectral = loqa_analyze_spectrum(&fftResult)
// Use spectral features...
}
Important: All validation happens in the Rust core. Higher-level layers (Swift/TypeScript) should trust Rust validation rather than implementing their own rules.
loqa_detect_pitch)| Parameter | Valid Range | Recommended | Notes |
|---|---|---|---|
buffer_size |
≥ 100 samples | 2048-4096 samples | See "Buffer Size Recommendations" above |
sample_rate |
8000-96000 Hz | 16000-44100 Hz | Higher rates support higher frequency ranges |
min_frequency |
20-4000 Hz | 80 Hz (male voice) | Must be < max_frequency |
max_frequency |
40-8000 Hz | 400 Hz (voice) | Must be > min_frequency |
loqa_extract_formants)| Parameter | Valid Range | Recommended | Notes |
|---|---|---|---|
buffer_size |
≥ 2048 samples | 2048-4096 samples | Larger buffers improve formant resolution |
sample_rate |
8000-96000 Hz | 16000-44100 Hz | Higher rates capture higher formants |
lpc_order |
8-24 | 12-16 | NOT sample_rate / 1000 - use fixed range instead |
Historical Note: Issue loqa-expo-dsp#8 - TypeScript calculated lpc_order = sample_rate / 1000 + 2 which gave 46 for 44.1kHz. Swift layer rejected this as out of range, causing all calls to fail. Solution: Use fixed range 8-24 for all sample rates.
loqa_compute_fft)| Parameter | Valid Range | Recommended | Notes |
|---|---|---|---|
buffer_size |
≥ fft_size | = fft_size | Larger buffers are truncated |
sample_rate |
8000-96000 Hz | 16000-48000 Hz | Affects frequency bin resolution |
fft_size |
Power of 2: 512-8192 | 2048 or 4096 | Non-power-of-2 may fail (impl-specific) |
loqa_calculate_hnr)| Parameter | Valid Range | Recommended | Notes |
|---|---|---|---|
buffer_size |
≥ 2048 samples | 2048-4096 | Needs multiple pitch periods |
sample_rate |
8000-96000 Hz | 16000 Hz | Standard voice analysis rate |
min_frequency |
50-300 Hz | 75 Hz | Lowest expected F0 |
max_frequency |
200-600 Hz | 500 Hz | Highest expected F0 |
loqa_calculate_h1h2)| Parameter | Valid Range | Recommended | Notes |
|---|---|---|---|
buffer_size |
≥ 2048 samples | 4096 samples | Needs good spectral resolution |
sample_rate |
8000-96000 Hz | 16000-44100 Hz | Higher rates improve harmonic resolution |
f0 |
0.0 or 50-800 Hz | Detected pitch | Pass 0.0 for auto-detect, or provide known F0 |
Auto-detect F0: Pass 0.0 (or any negative value) for f0 parameter to automatically detect pitch before calculating H1-H2.
1. Struct Alignment Issues (Fixed in v0.2.1)
#[repr(C)] caused field misalignment#[repr(C)] - verified by CI tests2. Parameter Validation Mismatches (Fixed in v0.2.2)
3. Buffer Size Confusion (Documented in v0.2.2)
4. Memory Leaks with FFT (Prevented by design)
defer or RAII patterns to ensure cleanupValidated Performance (2025-11-07) - All targets exceeded ✅
| Operation | Target | Actual (mean) | Result | Speedup |
|---|---|---|---|---|
| Pitch detection (100ms audio) | <20ms | 0.125ms | ✅ PASS | 160x faster |
| Formant extraction (500ms audio) | <50ms | 0.134ms | ✅ PASS | 373x faster |
| FFT (2048 points) | <10ms | ~0.020ms | ✅ PASS | 500x faster |
| Spectral analysis | <5ms | ~0.003ms | ✅ PASS | 1667x faster |
| HNR calculation (100ms window) | <30ms | <1ms | ✅ PASS | >30x faster |
| H1-H2 with F0 provided | <20ms | <1ms | ✅ PASS | >20x faster |
Note: Benchmarks run on Apple M-series silicon. All latency targets easily met with significant performance headroom for real-time voice processing.
Starting in v0.4.0, we use a custom pYIN implementation optimized for voice analysis, removing the external pyin crate dependency.
What is pYIN?
pYIN (Mauch & Dixon, 2014) extends the YIN pitch detection algorithm to produce probabilistic pitch estimates, making it more robust for noisy or breathy voice signals.
Key Differences from Standard YIN:
Our Voice-Optimized Implementation:
Two-Stage Process:
Voice-Specific Optimizations:
Benefits:
Performance:
References:
Measures the ratio of harmonic (periodic) to noise (aperiodic) energy in voice - the primary acoustic indicator of breathiness.
| HNR Range | Interpretation |
|---|---|
| 18-25+ dB | Clear, less breathy voice |
| 12-18 dB | Moderate breathiness |
| <10 dB | Very breathy or pathological voice |
Measures the amplitude difference between the fundamental and second harmonic - indicates vocal weight.
| H1-H2 Range | Interpretation |
|---|---|
| >5 dB | Lighter, breathier vocal quality |
| 0-5 dB | Balanced vocal weight |
| <0 dB | Fuller, heavier vocal quality |
This library uses samples from the Saarbrücken Voice Database for consistency validation testing.
License: CC BY 4.0
Attribution: Pützer, M. & Barry, W.J., Former Institute of Phonetics, Saarland University. Available at Zenodo.
The SVD provides lab-quality voice recordings including:
# 1. Download SVD from Zenodo (CC BY 4.0 license)
# https://zenodo.org/records/16874898
# 2. Install conversion dependencies
pip install scipy numpy
# 3. Convert SVD files to test format
python scripts/download_svd.py /path/to/extracted/svd
For comprehensive validation, the library needs test samples with these characteristics:
| Function | Sample Requirements | Recommended Datasets |
|---|---|---|
| Pitch Detection | Male (80-180 Hz), Female (160-300 Hz), varied intonation | Saarbrücken Voice Database, PTDB-TUG |
| Formant Extraction | Sustained vowels /a/, /i/, /u/, /e/, /o/ from multiple speakers | Hillenbrand Vowel Database, VTR-TIMIT |
| HNR | Breathy, modal, and clear voice qualities | Saarbrücken Voice Database |
| H1-H2 | Light to full voice qualities, different phonation types | UCLA Voice Quality Database, VoiceSauce reference recordings |
| Spectral | Dark to bright voice qualities | Voice quality databases with perceptual labels |
# Build
cargo build --release
# Test
cargo test
# Benchmark
cargo bench
# Documentation
cargo doc --open
MIT