Spectrograms
ust
High-performance spectrogram computation with Rust and Python bindings.
Features
- Multiple Frequency Scales: Linear, Mel, ERB, and CQT
- Multiple Amplitude Scales: Power, Magnitude, and Decibels
- Advanced Audio Features: MFCC, Chromagram, and raw STFT
- Plan-Based Computation: Reuse FFT plans for 2-10x speedup on batch processing
- Two FFT Backends: FFTW (fastest) or pure-Rust RealFFT
- Streaming Support: Frame-by-frame processing for real-time applications
- Type-Safe Rust API: Compile-time guarantees for spectrogram types
- Python Bindings: Fast computation with NumPy integration and GIL-free execution
Why Choose Spectrograms?
- Cross-Language: Use from Rust or Python with consistent APIs
- High Performance: Rust implementation, Python bindings with minimal overhead
- Not Limited to One Type: Multiple frequency scales in a unified API
- Production Ready: Efficient batch processing and streaming support
- Well Documented: Comprehensive integration guide, examples, and API docs
Installation
| Rust |
Python |
[dependencies]
spectrograms = "0.1"
For pure-Rust FFT (no system dependencies):
[dependencies]
spectrograms = {
version = "0.1",
default-features = false,
features = ["realfft"]
}
|
pip install spectrograms
For FFTW-accelerated version (requires system FFTW library):
pip install spectrograms-fftw
|
Quick Start
Generate a Test Signal
| Rust |
Python |
use std::f64::consts::PI;
// 1 second of 440 Hz sine wave
let sample_rate = 16000.0;
let samples: Vec<f64> = (0..16000)
.map(|i| {
let t = i as f64 / sample_rate;
(2.0 * PI * 440.0 * t).sin()
})
.collect();
|
import numpy as np
# 1 second of 440 Hz sine wave
sample_rate = 16000
t = np.linspace(0, 1, sample_rate, dtype=np.float64)
samples = np.sin(2 * np.pi * 440 * t)
|
Compute a Basic Spectrogram
| Rust |
Python |
use spectrograms::*;
// Configure parameters
let stft = StftParams::new(
512, // FFT size
256, // hop size
WindowType::Hanning, // window
true // centre frames
)?;
let params = SpectrogramParams::new(
stft,
sample_rate
)?;
// Compute power spectrogram
let spec = LinearPowerSpectrogram::compute(
&samples,
¶ms,
None
)?;
println!("Shape: {} bins × {} frames",
spec.n_bins(), spec.n_frames());
|
import spectrograms as sg
# Configure parameters
stft = sg.StftParams(
n_fft=512,
hop_size=256,
window=sg.WindowType.hanning(),
centre=True
)
params = sg.SpectrogramParams(
stft,
sample_rate=sample_rate
)
# Compute power spectrogram
spec = sg.compute_linear_power_spectrogram(
samples,
params
)
print(f"Shape: {spec.n_bins} bins × {spec.n_frames} frames")
|
Mel Spectrogram Example
| Rust |
Python |
use spectrograms::*;
let stft = StftParams::new(512, 256, WindowType::Hanning, true)?;
let params = SpectrogramParams::new(stft, 16000.0)?;
// Mel filterbank
let mel = MelParams::new(
80, // n_mels
0.0, // f_min
8000.0 // f_max
)?;
// dB scaling
let db = LogParams::new(-80.0)?;
// Compute mel spectrogram in dB
let spec = MelDbSpectrogram::compute(
&samples, ¶ms, &mel, Some(&db)
)?;
// Access data
println!("Mel bands: {}", spec.n_bins());
println!("Frames: {}", spec.n_frames());
println!("Frequency range: {:?}",
spec.axes().frequency_range());
|
import spectrograms as sg
stft = sg.StftParams(512, 256, sg.WindowType.hanning(), True)
params = sg.SpectrogramParams(stft, 16000)
# Mel filterbank
mel = sg.MelParams(
n_mels=80,
f_min=0.0,
f_max=8000.0
)
# dB scaling
db = sg.LogParams(floor_db=-80.0)
# Compute mel spectrogram in dB
spec = sg.compute_mel_db_spectrogram(
samples, params, mel, db
)
# Access data
print(f"Mel bands: {spec.n_bins}")
print(f"Frames: {spec.n_frames}")
print(f"Frequency range: {spec.frequency_range()}")
|
Efficient Batch Processing
Reuse FFT plans for 2-10x speedup when processing multiple signals:
| Rust |
Python |
use spectrograms::*;
let signals = vec![
vec![0.0; 16000],
vec![0.0; 16000],
vec![0.0; 16000],
];
let stft = StftParams::new(512, 256, WindowType::Hanning, true)?;
let params = SpectrogramParams::new(stft, 16000.0)?;
let mel = MelParams::new(80, 0.0, 8000.0)?;
let db = LogParams::new(-80.0)?;
// Create plan once
let planner = SpectrogramPlanner::new();
let mut plan = planner.mel_db_plan(
¶ms, &mel, Some(&db)
)?;
// Reuse for all signals (much faster!)
for signal in signals {
let spec = plan.compute(&signal)?;
// Process spec...
}
|
import spectrograms as sg
import numpy as np
signals = [
np.random.randn(16000),
np.random.randn(16000),
np.random.randn(16000),
]
stft = sg.StftParams(512, 256, sg.WindowType.hanning(), True)
params = sg.SpectrogramParams(stft, 16000)
mel = sg.MelParams(80, 0.0, 8000.0)
db = sg.LogParams(-80.0)
# Create plan once
planner = sg.SpectrogramPlanner()
plan = planner.mel_db_plan(params, mel, db)
# Reuse for all signals (much faster!)
for signal in signals:
spec = plan.compute(signal)
# Process spec...
|
Advanced Features
MFCCs (Mel-Frequency Cepstral Coefficients)
| Rust |
Python |
use spectrograms::*;
let stft = StftParams::new(512, 160, WindowType::Hanning, true)?;
let mfcc_params = MfccParams::new(13)?;
let mfccs = compute_mfcc(
&samples,
&stft,
16000.0,
40, // n_mels
&mfcc_params
)?;
// Shape: (13, n_frames)
println!("MFCCs: {} × {}", mfccs.nrows(), mfccs.ncols());
|
import spectrograms as sg
stft = sg.StftParams(512, 160, sg.WindowType.hanning(), True)
mfcc_params = sg.MfccParams(n_mfcc=13)
mfccs = sg.compute_mfcc(
samples,
stft,
sample_rate=16000,
n_mels=40,
mfcc_params=mfcc_params
)
# Shape: (13, n_frames)
print(f"MFCCs: {mfccs.shape}")
|
Chromagram (Pitch Class Profiles)
| Rust |
Python |
use spectrograms::*;
let stft = StftParams::new(4096, 512, WindowType::Hanning, true)?;
let chroma_params = ChromaParams::music_standard();
let chroma = compute_chromagram(
&samples,
&stft,
22050.0,
&chroma_params
)?;
// Shape: (12, n_frames) - one row per pitch class
println!("Chroma: {} × {}", chroma.nrows(), chroma.ncols());
|
import spectrograms as sg
stft = sg.StftParams(4096, 512, sg.WindowType.hanning(), True)
chroma_params = sg.ChromaParams.music_standard()
chroma = sg.compute_chromagram(
samples,
stft,
sample_rate=22050,
chroma_params=chroma_params
)
# Shape: (12, n_frames)
print(f"Chroma: {chroma.shape}")
|
Supported Spectrogram Types
Frequency Scales
- Linear (
LinearHz): Standard FFT bins, evenly spaced in Hz
- Mel (
Mel): Mel-frequency scale, perceptually motivated for speech/audio
- ERB (
Erb): Equivalent Rectangular Bandwidth, models auditory perception
- CQT: Constant-Q Transform for music analysis
- Log (
LogHz): Logarithmic frequency spacing
Amplitude Scales
| Scale |
Formula |
Use Case |
| Power |
|X|² |
Energy analysis, ML features |
| Magnitude |
|X| |
Spectral analysis, phase vocoder |
| Decibels |
10·log₁₀(power) |
Visualization, perceptual analysis |
Type Aliases (Rust)
// Linear frequency
type LinearPowerSpectrogram = Spectrogram<LinearHz, Power>;
type LinearMagnitudeSpectrogram = Spectrogram<LinearHz, Magnitude>;
type LinearDbSpectrogram = Spectrogram<LinearHz, Decibels>;
// Mel frequency
type MelPowerSpectrogram = Spectrogram<Mel, Power>;
type MelMagnitudeSpectrogram = Spectrogram<Mel, Magnitude>;
type MelDbSpectrogram = Spectrogram<Mel, Decibels>;
// ERB frequency
type ErbPowerSpectrogram = Spectrogram<Erb, Power>;
type ErbMagnitudeSpectrogram = Spectrogram<Erb, Magnitude>;
type ErbDbSpectrogram = Spectrogram<Erb, Decibels>;
Window Functions
Supported window functions with different frequency/time resolution trade-offs:
rectangular: No windowing (best frequency resolution, high leakage)
hanning: Good general-purpose window (default)
hamming: Similar to Hanning with different coefficients
blackman: Low sidelobes, wider main lobe
bartlett: Triangular window
kaiser=<beta>: Tunable trade-off (β controls shape, e.g., kaiser=5.0)
gaussian=<std>: Smooth roll-off (e.g., gaussian=0.4)
| Rust |
Python |
// Parse from string
let window: WindowType = "hanning".parse()?;
let kaiser: WindowType = "kaiser=8.0".parse()?;
// Or use constructors
let hann = WindowType::Hanning;
let gauss = WindowType::Gaussian { std: 0.4 };
|
# Use class methods
window = sg.WindowType.hanning()
kaiser = sg.WindowType.kaiser(beta=8.0)
gauss = sg.WindowType.gaussian(std=0.4)
# Or from string
stft = sg.StftParams(512, 256, "kaiser=8.0", True)
|
Default Presets
| Rust |
Python |
// Speech processing preset
// n_fft=512, hop_size=160
let params = SpectrogramParams::speech_default(16000.0)?;
// Music processing preset
// n_fft=2048, hop_size=512
let params = SpectrogramParams::music_default(44100.0)?;
|
# Speech processing preset
params = sg.SpectrogramParams.speech_default(sample_rate=16000)
# Music processing preset
params = sg.SpectrogramParams.music_default(sample_rate=44100)
|
Accessing Results
| Rust |
Python |
let spec = LinearPowerSpectrogram::compute(&samples, ¶ms, None)?;
// Dimensions
let n_bins = spec.n_bins();
let n_frames = spec.n_frames();
// Data (ndarray::Array2<f64>)
let data = spec.data();
// Axes
let freqs = spec.axes().frequencies();
let times = spec.axes().times();
let (f_min, f_max) = spec.axes().frequency_range();
let duration = spec.axes().duration();
// Original parameters
let params = spec.params();
|
spec = sg.compute_linear_power_spectrogram(samples, params)
# Dimensions
n_bins = spec.n_bins
n_frames = spec.n_frames
# Data (numpy array)
data = spec.data # shape: (n_bins, n_frames)
# Axes
freqs = spec.frequencies
times = spec.times
f_min, f_max = spec.frequency_range()
duration = spec.duration()
# Original parameters
params = spec.params
|
Examples
Comprehensive examples in both languages:
Rust (examples/):
Python (python/examples/):
| Rust |
Python |
cargo run --example basic_linear
cargo run --example mel_spectrogram
|
python python/examples/basic_linear.py
python python/examples/mel_spectrogram.py
|
Documentation
Feature Flags (Rust)
The Rust library requires exactly one FFT backend:
Additional flags:
python (default): Enables Python bindings
serde: Enables serialization support
# Pure Rust, no Python
[dependencies]
spectrograms = { version = "0.1", default-features = false, features = ["realfft"] }
# FFTW backend with Python
[dependencies]
spectrograms = { version = "0.1", default-features = false, features = ["fftw", "python"] }
Performance Tips
- Reuse plans: Use
SpectrogramPlanner for 2-10x speedup on batch processing
- Choose power-of-2 FFT sizes: Best performance (512, 1024, 2048, 4096)
- Use FFTW backend: Maximum speed when system dependencies are acceptable
- Python GIL: All compute functions release the GIL for parallelism
- Streaming: Use frame-by-frame processing for real-time applications
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Citation
If you use this library in academic work, please cite:
@software{spectrograms2025,
author = {Geraghty, Jack},
title = {Spectrograms: High-Performance Spectrogram Computation},
year = {2025},
url = {https://github.com/jmg049/Spectrograms}
}
Note: This library focuses on spectrogram computation. For complete audio analysis pipelines, combine it with audio I/O libraries like audio_samples and your preferred plotting tools.