| Crates.io | wc-parser |
| lib.rs | wc-parser |
| version | 0.1.2 |
| created_at | 2025-07-11 07:31:53.594326+00 |
| updated_at | 2025-07-11 11:30:28.467184+00 |
| description | A decently fast Rust library for parsing WhatsApp chat exports |
| homepage | https://github.com/zktaiga/wc-parser |
| repository | https://github.com/zktaiga/wc-parser |
| max_upload_size | |
| id | 1747494 |
| size | 59,601 |
A decently fast Rust library for parsing WhatsApp chat exports.
wc-parser is designed to be fast and memory-efficient. Key optimisations include:
Memory-mapped I/O — parse_file uses memmap2 so chat exports are read straight from the operating-system page-cache without first copying them into a String, keeping peak RSS low even for multi-gigabyte logs.
Zero-copy parsing — When parsing from a &str, we split the original slice into &str line slices instead of allocating new strings, only allocating when constructing the final Message structs.
Pre-compiled regular expressions — All regex patterns are built once at start-up via lazy_static!, removing the compile cost from the hot parsing path.
Data-parallel message processing — Heavy-weight work (regex capture extraction, date/time normalisation, etc.) runs in parallel across CPU cores with rayon when debug output is disabled.
Selective attachment parsing — Attachment extraction is completely skipped unless parse_attachments = true, saving an extra regex run per message in the common case.
Configurable debug logging — Expensive debug printing is off by default. When enabled it switches to single-threaded execution to keep log output ordered.
Small-footprint date handling — Simple heuristics determine whether the log is day-first or month-first in a single pass, avoiding per-message branching once parsing begins.
Add this to your Cargo.toml:
[dependencies]
wc-parser = "0.1.2"
use wc_parser::parse_string;
fn main() {
let chat_content = r#"
06/03/2017, 00:45 - Sample User: This is a test message
08/05/2017, 01:48 - TestBot: Hey I'm a test too!
09/04/2017, 01:50 - +410123456789: How are you?
Is everything alright?
"#;
let messages = parse_string(chat_content, None).unwrap();
for message in messages {
println!("Date: {}", message.date);
if let Some(author) = message.author {
println!("Author: {}", author);
} else {
println!("System message");
}
println!("Message: {}", message.message);
println!("---");
}
}
use wc_parser::{parse_string, models::ParseStringOptions};
let options = ParseStringOptions {
days_first: Some(true), // Specify date format
parse_attachments: true, // Parse attachment information
};
let messages = parse_string(chat_content, Some(options)).unwrap();
Each parsed message contains:
// Located in `src/models.rs`
pub struct Message {
// Located in `src/models.rs`
// Located in `src/models.rs`
pub date: DateTime<Utc>, // Date and time of the message
pub author: Option<String>, // Author name (None for system messages)
pub message: String, // Message content
pub attachment: Option<Attachment>, // Attachment info (if parse_attachments is enabled)
}
This library supports various WhatsApp chat export formats including: