| Crates.io | bitcoinleveldb-logreader |
| lib.rs | bitcoinleveldb-logreader |
| version | 0.1.1 |
| created_at | 2025-12-01 17:00:09.818965+00 |
| updated_at | 2025-12-01 17:00:09.818965+00 |
| description | Low-level LevelDB-compatible log reader for Rust, reconstructing fragmented WAL records with optional CRC32C checksums, precise corruption reporting, and offset-based resynchronization for Bitcoin-style append-only logs. |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1960072 |
| size | 156,342 |
A low-level, LevelDB-compatible log reader for Rust, tailored for Bitcoin Core–style environments and other append-only WAL/redo-log use cases.
This crate provides a faithful, allocation-conscious reimplementation of the LevelDB log format reader, including fragmentation handling, checksumming, corruption reporting, and initial-offset based resynchronization.
LogReader is designed to read logical records out of an append-only, block-structured log file that uses the LevelDB log format. In this format, each logical record is represented as one or more physical fragments stored in fixed-size blocks (typically 32 KiB):
Full, First, Middle, LastFull fragment or a First + zero or more Middle + Last fragmentsLogReader wraps a SequentialFile abstraction to read from the underlying file descriptor/handle, reconstructs fragmented records, and validates them (optionally with checksums). It also tracks physical offsets and an initial logical offset for resuming from arbitrary positions.
First/Middle/Last fragments into full logical recordsLogReaderReporterinitial_offset to start from an arbitrary physical positionSlice abstraction for minimizing allocations and copiesThe behavior is closely aligned with LevelDB’s original C++ log::Reader, which is particularly relevant when interoperating with Bitcoin Core data directories or other systems using the same log encoding.
LogReaderReporter/// Interface for reporting errors.
pub trait LogReaderReporter {
fn corruption(&mut self, bytes: usize, status: &Status);
}
Implement this trait to hook into corruption reporting. bytes is the number of bytes dropped from the log, and status encodes a corruption(...) Status created by the reader. Typical implementations log, meter, or abort on corruption.
LogReader#[derive(Builder, Setters, Getters, MutGetters)]
pub struct LogReader {
file: Box<dyn SequentialFile>,
reporter: Box<dyn LogReaderReporter>,
checksum: bool,
backing_store: *const u8,
buffer: Slice,
eof: bool,
last_record_offset: u64,
end_of_buffer_offset: u64,
initial_offset: u64,
resyncing: bool,
}
The primary entry point is the constructor and read_record:
impl LogReader {
pub fn new(
file: Box<dyn SequentialFile>,
reporter: Box<dyn LogReaderReporter>,
checksum: bool,
initial_offset: u64,
) -> Self { /* ... */ }
/// Read the next logical record into `record`.
/// Returns `true` on success, `false` on EOF.
pub fn read_record(&mut self, record: &mut Slice, scratch: &mut Vec<u8>) -> bool { /* ... */ }
}
SequentialFile: Abstraction of a forward-only file. You must provide a concrete implementation that supports read(block_size, result_ptr, scratch_ptr) and skip(n) semantics. This is often backed by POSIX or OS-specific APIs.Slice: Thin, pointer-based view over a region of memory (data pointer + length). Both incoming file data and exposed record payloads are represented as Slices.LogReader owns a fixed-sized backing buffer (LOG_BLOCK_SIZE) allocated once in new(). read_into_buffer_from_file refills buffer by reading into this backing store.end_of_buffer_offset tracks the physical offset just past the last byte read from the filelast_record_offset tracks the physical offset of the last successfully delivered logical recordinitial_offset is the target physical offset to start delivering records fromBelow is a conceptual usage sketch. Types like SequentialFile, Slice, Status, and enum types such as LogRecordType and ExtendedRecordTypes are provided by this crate’s ecosystem (e.g. a Bitcoin LevelDB compatibility layer) and are not reproduced here.
use bitcoinleveldb_logreader::{LogReader, LogReaderReporter};
use bitcoinleveldb_env::PosixSequentialFile; // example
use bitcoinleveldb_types::{Slice, Status};
struct MyReporter;
impl LogReaderReporter for MyReporter {
fn corruption(&mut self, bytes: usize, status: &Status) {
eprintln!("dropped {} bytes due to corruption: {:?}", bytes, status);
// choose your own policy: log, metrics, panic, etc.
}
}
fn read_all_records(path: &str) -> Result<(), Status> {
// Build a SequentialFile (implementation-specific)
let file: Box<dyn SequentialFile> = Box::new(PosixSequentialFile::open(path)?);
let reporter: Box<dyn LogReaderReporter> = Box::new(MyReporter);
let checksum = true; // enable CRC32C verification
let initial_offset = 0u64; // start from beginning of file
let mut reader = LogReader::new(file, reporter, checksum, initial_offset);
let mut record = Slice::default();
let mut scratch = Vec::new();
while reader.read_record(&mut record, &mut scratch) {
// `record` is valid only until the next reader mutation or scratch change
let bytes = unsafe { std::slice::from_raw_parts(*record.data(), *record.size()) };
// Process record bytes
handle_record(bytes);
}
Ok(())
}
fn handle_record(bytes: &[u8]) {
// Application-specific record decoding
}
For fast resume or partial replay, specify initial_offset:
let last_known_offset: u64 = load_checkpoint();
let mut reader = LogReader::new(file, reporter, true, last_known_offset);
LogReader will:
skip_to_initial_block() to jump to the first candidate blockFull or First fragment is observed at or after initial_offsetThis mirrors LevelDB’s robust recovery semantics in the presence of torn writes or partial truncations.
From a correctness and data-integrity perspective, the key behaviors are:
checksum == true):
"checksum mismatch")"bad record length")"unknown record type")initial_offset:
report_drop only emits corruption when the physical current_offset is ≥ initial_offsetskip to the target block are treated specially and reported regardless to avoid silent data lossYour LogReaderReporter implementation must:
LogReaderbytes valuesImportant invariants for a safe integration:
LogReader owns a heap-allocated [u8; LOG_BLOCK_SIZE] backing buffer
*const u8 to it (backing_store), to pass to SequentialFile::readDrop, it reconstructs the Box<[u8; LOG_BLOCK_SIZE]> and frees itread_record exposes a Slice pointing directly into this backing buffer or into the provided scratch buffer when assembling fragmented recordsSlice returned through record is valid only until:
read_record call, orscratchFrom a performance standpoint, this model keeps heap allocations predictable and reduces copying, which is vital when scanning large Bitcoin block or mempool logs.
LogReader is not designed to be used from multiple threads concurrently without external synchronization:
Typical patterns:
SequentialFile and LogReader instance for disjoint files or segmentsIf you must share access, wrap LogReader in a Mutex or design a higher-level ingestion pipeline that uses channels to publish decoded records to multiple worker threads.
LevelDB uses CRC32C (Castagnoli polynomial) with a masking scheme to protect against certain adversarial patterns and coincidences.
In this crate, the relevant operations are:
decode_fixed32(header_ptr) – read the stored masked CRC32C from the headercrc32c_unmask(encoded) – recover the actual CRC32C from the masked onecrc32c_value(header_ptr.add(6), 1 + length) – compute CRC32C over the concatenation of the record type byte and the payloadThe record is valid if actual_crc == expected_crc. This is a practical compromise: strong enough against random bit flips and most disk-level corruption, extremely fast, and matches the on-disk format used by LevelDB and Bitcoin.
Given:
LOG_BLOCK_SIZE = BLOG_HEADER_SIZE = Hinitial_offset = Oskip_to_initial_block computes:
offset_in_block = O mod Bblock_start_location = O - offset_in_blockoffset_in_block > B - 6, the block is treated as a trailing fragment trailer; we skip to block_start_location + BThis guarantees that the scan begins at a block that can contain the first complete logical record at or after O, respecting the requirement that headers must fit inside a single block (no cross-block headers).
This crate is appropriate when you:
It is not a high-level database API; instead, it is a specialized building block for storage engines, replication log readers, forensic tools, and Bitcoin-related infrastructure.
YourName <you@example.com>You are encouraged to audit the implementation, especially around unsafe pointers and Slice lifetimes, to ensure that it matches your safety and reliability requirements.