| Crates.io | nostr-archive-cursor |
| lib.rs | nostr-archive-cursor |
| version | 0.5.1 |
| created_at | 2025-10-15 19:28:40.413569+00 |
| updated_at | 2025-12-10 12:17:10.908299+00 |
| description | A utility library for iterating over JSON-L archives. |
| homepage | |
| repository | https://github.com/v0l/nostr-archive-cursor.git |
| max_upload_size | |
| id | 1884836 |
| size | 151,695 |
Process JSON-L backups and compute some stats
A memory-efficient streaming processor for Nostr event archives that supports:
walk_with() uses borrowed data with no string allocations during parsinguse futures::stream::StreamExt;
// Sequential processing (default)
let cursor = NostrCursor::new("./backups".into());
let mut stream = cursor.walk();
while let Some(event) = stream.next().await {
// Process event sequentially
}
// Parallel file reading (4 files at once)
// Note: Events are still consumed sequentially from the stream
let cursor = NostrCursor::new("./backups".into())
.with_parallelism(4);
let mut stream = cursor.walk();
while let Some(event) = stream.next().await {
// Process event
}
// Use all available CPU cores for parallel processing
let cursor = NostrCursor::new("./backups".into())
.with_max_parallelism();
// Disable deduplication if you're certain there are no duplicates
let cursor = NostrCursor::new("./backups".into())
.with_dedupe(false);
For true parallel event processing, use walk_with which invokes a callback from multiple file readers concurrently. Events are parsed with zero-copy deserialization for maximum performance:
use std::sync::{Arc, Mutex};
let cursor = NostrCursor::new("./backups".into())
.with_parallelism(4);
let counter = Arc::new(Mutex::new(0));
let counter_clone = counter.clone();
cursor.walk_with(move |event| {
let counter = counter_clone.clone();
async move {
// This async callback is invoked in parallel by multiple file readers
// Event is borrowed (zero-copy) - no string allocations during parsing
// Use Arc/Mutex for shared state
let mut count = counter.lock().unwrap();
*count += 1;
// Access borrowed fields directly (zero-copy)
println!("Event ID: {}", event.id);
// Convert to owned if you need to store the event
// let owned = event.to_owned();
}
}).await;
println!("Processed {} events", *counter.lock().unwrap());
For maximum performance, use walk_with_chunked which processes events in batches. This is significantly faster than processing one event at a time:
use std::sync::{Arc, Mutex};
let cursor = NostrCursor::new("./backups".into())
.with_parallelism(4);
let counter = Arc::new(Mutex::new(0));
let counter_clone = counter.clone();
cursor.walk_with_chunked(move |events| {
let counter = counter_clone.clone();
Box::pin(async move {
// Process batch of borrowed events in parallel
let mut count = counter.lock().unwrap();
*count += events.len();
// All events in the batch borrow from the same buffer
for event in events {
println!("Processing event: {}", event.id);
}
})
}, 1000).await;
println!("Processed {} events", *counter.lock().unwrap());
.with_max_parallelism() to use all CPU cores.with_dedupe(false) if not neededwalk_with() and walk_with_chunked() use borrowed strings during parsing - no allocations until you call .to_owned()walk() for sequential processing, walk_with() for parallel event-by-event processing, walk_with_chunked() for parallel batch processing (fastest).json - Uncompressed JSON-L.jsonl - Uncompressed JSON-L.gz - Gzip compressed.zst - Zstandard compressed.bz2 - Bzip2 compressedA nostr_sdk database backend that writes events to daily flat JSON-L files with automatic deduplication and compression.
events_YYYYMMDD.jsonl)sled for fast event ID lookups to prevent duplicatesnostr_sdk database backendsuse nostr_archive_cursor::JsonFilesDatabase;
use nostr_sdk::prelude::*;
// Create database instance
let db = JsonFilesDatabase::new("./archive".into())?;
// Use with nostr_sdk client
let client = ClientBuilder::new()
.database(db)
.build();
// Events are automatically saved to daily files
client.add_relay("wss://relay.example.com").await?;
client.connect().await;
// Events received from relays are saved to:
// - ./archive/events_20250112.jsonl (current day)
// - ./archive/events_20250111.jsonl.zst (previous days, compressed)
// - ./archive/index/ (sled database for deduplication)
// Create new database
let db = JsonFilesDatabase::new(dir)?;
// List all archive files
let files: Vec<ArchiveFile> = db.list_files().await?;
for file in files {
println!("{}: {} bytes, created {}",
file.path.display(),
file.size,
file.timestamp
);
}
// Get specific archive file
let file = db.get_file("/events_20250112.jsonl")?;
// List event IDs in index with time range filter (for sync)
let since = 0; // Unix timestamp
let until = u64::MAX; // Unix timestamp
let ids: Vec<(EventId, Timestamp)> = db.list_ids(since, until);
// Get total event count
let count = db.count_keys();
// Check if index is empty
let is_empty = db.is_index_empty();
// Rebuild the event ID index from archive files
db.rebuild_index().await?;
archive/
├── index/ # sled database (event ID → timestamp)
├── events_20250110.jsonl.zst # Compressed past files
├── events_20250111.jsonl.zst
└── events_20250112.jsonl # Current day (uncompressed)
event_by_id, query, count) return empty results by default.Arc<Mutex<FlatFileWriter>> for concurrent event writes.sled for crash-safe deduplication index.