aozora2text

Crates.ioaozora2text
lib.rsaozora2text
version0.7.0
created_at2026-01-03 09:50:43.967215+00
updated_at2026-01-03 16:52:35.896539+00
descriptionConvert Aozora Bunko format to plain text
homepage
repositoryhttps://github.com/takahashim/aozora2
max_upload_size
id2019850
size31,122
Masayoshi Takahashi (takahashim)

documentation

README

aozora2text

CI crates.io

A Rust tool to convert Aozora Bunko format text to plain text.

Note: This package is a backward-compatible wrapper providing the same functionality as aozora2 strip. For new projects, consider using aozora2 instead.

In Japanese

Features

  • Remove (implicit) ruby annotations 《》
  • Remove explicit ruby annotations |...《》
  • Remove annotation commands [#...]
  • Convert gaiji (external characters) ※[#...] to Unicode
  • Convert accent notation 〔...〕 to accented characters
  • Remove header (title/author) and footer (source info)
  • Auto-detect UTF-8 / Shift_JIS encoding
  • Support ZIP files (Aozora Bunko distribution format)

Installation

cargo install aozora2text

Usage

Command Line

# Convert a file
aozora2text input.txt -o output.txt

# Use stdin/stdout
cat input.txt | aozora2text > output.txt

# ZIP file (Aozora Bunko download format)
aozora2text --zip wagahaiwa_nekodearu.zip -o output.txt

Library

// High-level API (with body extraction)
let input = "Title\nAuthor\n\n吾輩《わがはい》は猫である\n底本:青空文庫";
let plain = aozora2text::convert(input.as_bytes());
assert_eq!(plain, "吾輩は猫である\n");

// Low-level API (single line)
let line = "吾輩《わがはい》は猫《ねこ》である";
let plain = aozora2text::convert_line(line);
assert_eq!(plain, "吾輩は猫である");

Conversion Examples

Input Output
漢字《かんじ》 漢字
|東京《とうきょう》 東京
猫である[#「である」に傍点] 猫である
※[#「丸印」、U+25CB]
〔cafe'〕 café

License

MIT

Commit count: 0

cargo fmt