epub-parser

Crates.io	epub-parser
lib.rs	epub-parser
version	0.2.0
created_at	2026-01-23 14:01:47.550138+00
updated_at	2026-01-23 16:04:57.952829+00
description	A Rust library for extracting metadata, table of contents, text, cover, and images from EPUB files.
homepage
repository	https://github.com/zhangwfjh/epub-parser
max_upload_size
id	2064585
size	47,609

Shaun Zhang (zhangwfjh)

documentation

README

epub-parser

A Rust library for extracting metadata, table of contents, text, cover, and images from EPUB files

Features

✅ Parse EPUB container and locate OPF file
✅ Extract Dublin Core metadata (title, author, publisher, language, identifier, date, rights)
✅ Parse NCX table of contents with hierarchical structure
✅ Extract text from HTML/XHTML content files
✅ Extract cover image from EPUB
✅ Extract all images from EPUB
✅ Follow reading order from OPF spine
✅ Clean text extraction (strips HTML, handles line breaks)

Dependencies

zip - for extracting EPUB (which is a ZIP archive)
quick-xml - for parsing XML (OPF, NCX) and HTML content

Usage

use epub_parser::Epub;
use std::path::Path;

// Parse from file path
let epub = Epub::parse(Path::new("book.epub"))?;

// Or parse from in-memory buffer
let buffer = std::fs::read("book.epub")?;
let epub_from_buffer = Epub::parse_from_buffer(&buffer)?;

// Access metadata
println!("Title: {:?}", epub.metadata.title);
println!("Author: {:?}", epub.metadata.author);

// Access cover image
if let Some(ref href) = epub.cover.href {
    println!("Cover: {}", href);
    if let Some(ref content) = epub.cover.content {
        println!("Cover size: {} bytes", content.len());
        // Save cover image
        std::fs::write("cover.jpg", content)?;
    }
}

// Access images
for image in &epub.images {
    println!("Image: {} ({} bytes)", image.href,
        image.content.as_ref().map(|c| c.len()).unwrap_or(0));
    if let Some(ref content) = image.content {
        std::fs::write(&format!("images/{}", image.href), content)?;
    }
}

// Access table of contents
for entry in &epub.toc {
    println!("- {} ({})", entry.label, entry.href);
}

// Access page content
for page in &epub.pages {
    println!("Page {}: {} characters", page.index, page.content.len());
}

Commit count: 4

epub-parser

documentation

README

epub-parser

Features

Dependencies

Usage

cargo fmt