gitbook2text

Crates.io	gitbook2text
lib.rs	gitbook2text
version	0.3.1
created_at	2025-11-10 14:04:04.55213+00
updated_at	2025-11-12 09:50:51.229552+00
description	A CLI tool to download GitBook pages and convert them to markdown and text
homepage	https://github.com/Maki-Grz/gitbook2text
repository	https://github.com/Maki-Grz/gitbook2text
max_upload_size
id	1925572
size	107,520

Maximilien Grzeczka (Maki-Grz)

documentation

https://docs.rs/gitbook2text

README

gitbook2text

A CLI tool and a Rust library for crawling GitBook sites, downloading their pages, and converting them to Markdown and plain text.

✨ What's New v0.3.0

🕷️ Automatic Crawling: Automatically discovers all pages of a GitBook
✅ GitBook Verification: Detects if a site is indeed a GitBook before crawling
🚀 All-in-One Mode: Crawl and download in a single command
📋 Improved CLI Interface: Clear subcommands with clap

🚀 Installation

As a CLI Tool

cargo install gitbook2text

As a Dependency

Add this to your Cargo.toml:

[dependencies]
gitbook2text = "0.3"

📖 Usage

CLI

Full Mode (Recommended)

Crawls and downloads all pages in a single command:

gitbook2text all https://docs.example.com

Crawl Only Mode

Generates the links.txt file with all found links:

gitbook2text crawl https://docs.example.com

# With a custom output file
gitbook2text crawl https://docs.example.com -o my-links.txt

Download Only Mode

Downloads pages from an existing links file:

gitbook2text download

# With a custom file
gitbook2text download -i my-links.txt

Legacy Mode (Backward Compatible)

Without a subcommand, downloads from links.txt:

gitbook2text

Structure of Generated Files

Files are saved in:

data/md/ - Original markdown files
data/txt/ - Cleaned text files

Library

Crawling a GitBook

use gitbook2text::{is_gitbook, extract_gitbook_links, crawl_and_save};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let url = "https://docs.example.com";

// Check if it's a GitBook
if is_gitbook(url).await? {
println!("It's a GitBook!");

// Extract all links
let links = extract_gitbook_links(url).await?;
println!("Found {} pages", links.len());

// Or directly save to a file
crawl_and_save(url, "links.txt").await?;
}

Ok(())
}

Download and Convert

use gitbook2text::{download_page, markdown_to_text, txt_sanitize};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let url = "https://docs.example.com/page.md";

// Download the page
let content = download_page(url).await?;

// Convert to text
let text = markdown_to_text(&content);

// Clean the text
let cleaned = txt_sanitize(&text);

println!("{}", cleaned);
Ok(())
}

🔧 Features

✅ Smart crawling: Automatically discovers all pages of a documentation
✅ GitBook verification: Detects GitBook sites via their specific markers
✅ Concurrent downloading: Processes multiple pages simultaneously
✅ Markdown to text conversion: Clean content extraction
✅ Advanced cleaning: Removes special GitBook tags
✅ Code block support: Preserves titles and content
✅ Normalization: Uniform spaces and characters

🎯 Use cases

📚 Archive a complete documentation
🔍 Index content for a search engine
🤖 Prepare data for model training
📊 Analyze the structure of documentation
💾 Create documentation backups

📋 Practical Examples

Archiving Complete Documentation

# All in one
gitbook2text all https://docs.mydomain.com

# Or step by step
gitbook2text crawl https://docs.mydomain.com
gitbook2text download

Use with an automated workflow

#!/bin/bash
# backup-docs.sh

GITBOOK_URL="https://docs.example.com"
BACKUP_DIR="backups/$(date +%Y-%m-%d)"

mkdir -p "$BACKUP_DIR"
cd "$BACKUP_DIR"

gitbook2text all "$GITBOOK_URL"

echo "Backup completed in $BACKUP_DIR"

📚 API Documentation

For the full API documentation, visit docs.rs/gitbook2text.

🤝 Contribute

Contributions are welcome! Feel free to open an issue or a pull request.

📝 Changelog

See CHANGELOG.md for the version history.

📄 License

This project is dual-licensed under MIT or Apache-2.0, your choice.

MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)

🔗 Links

Commit count: 0

gitbook2text

documentation

README

gitbook2text

✨ What's New v0.3.0

🚀 Installation

As a CLI Tool

As a Dependency

📖 Usage

CLI

Full Mode (Recommended)

Crawl Only Mode

Download Only Mode

Legacy Mode (Backward Compatible)

Structure of Generated Files

Library

Crawling a GitBook

Download and Convert

🔧 Features

🎯 Use cases

📋 Practical Examples

Archiving Complete Documentation

Use with an automated workflow

📚 API Documentation

🤝 Contribute

📝 Changelog

📄 License

🔗 Links

cargo fmt