sitemap_generator

Crates.iositemap_generator
lib.rssitemap_generator
version0.1.1
created_at2025-11-01 08:35:05.798697+00
updated_at2025-11-01 09:21:56.052541+00
descriptionA high-performance Rust library for generating XML sitemaps (standard, image, video, and sitemap index)
homepage
repositoryhttps://github.com/sondt/sitemap_generator
max_upload_size
id1911816
size255,254
(sondt)

documentation

README

Sitemap Generator

Crates.io Documentation MIT License

A high-performance Rust library for generating XML sitemaps compliant with the sitemaps.org protocol 0.9.

Features

  • Standard XML Sitemaps: Generate basic sitemaps with URLs, lastmod, changefreq, and priority
  • Image Sitemaps: Include image metadata (captions, geo-location, titles, licenses)
  • Video Sitemaps: Include video metadata (thumbnails, descriptions, durations, ratings)
  • News Sitemaps: Submit news articles to Google News with publication metadata
  • Combined Sitemaps: Combine multiple extensions (image + video + news) in one sitemap
  • Sitemap Index: Manage multiple sitemap files for large websites (>50k URLs)
  • Validation: Automatic validation of URLs, size limits, and protocol compliance
  • Compression: Built-in gzip compression support (96-98% bandwidth savings)
  • Web Framework Support: Direct bytes output for Axum, Actix-web, Rocket, etc.
  • Parsing: Read and parse existing sitemap files
  • Memory Efficient: ~140 bytes/URL during generation, 0 bytes after (proven)
  • High Performance: ~830K URLs/second, immediate memory cleanup
  • Optimized Builders: Pre-allocate capacity with with_capacity() for better performance
  • Zero unsafe code: 100% safe Rust

Installation

Add this to your Cargo.toml:

[dependencies]
sitemap_generator = "0.1"

Quick Start

Basic Sitemap

Generate sitemap as String, bytes, or compressed bytes:

use sitemap_generator::{SitemapBuilder, UrlEntry, ChangeFreq};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Use with_capacity() if you know the number of URLs in advance (recommended)
    let mut builder = SitemapBuilder::with_capacity(1000);

    builder.add_url(UrlEntry::new("https://example.com/")
        .lastmod("2025-11-01")
        .changefreq(ChangeFreq::Daily)
        .priority(1.0));

    builder.add_url(UrlEntry::new("https://example.com/page1")
        .lastmod("2025-11-02")
        .priority(0.8));

    // 1. Generate as String
    let xml = builder.build()?;

    // 2. Generate as bytes (Vec<u8>) - for HTTP responses
    let bytes = builder.build_bytes()?;

    // 3. Generate as compressed bytes (gzip)
    let compressed = builder.build_compressed_bytes()?;

    // 4. Write to file
    builder.write("sitemap.xml")?;

    // 5. Write compressed file
    builder.write_compressed("sitemap.xml.gz")?;

    Ok(())
}

Image Sitemap

use sitemap_generator::{ImageSitemapBuilder, UrlEntry, UrlWithImages, ImageEntry};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut builder = ImageSitemapBuilder::new();

    let url_with_images = UrlWithImages::new(
        UrlEntry::new("https://example.com/gallery")
            .lastmod("2025-11-01")
    )
    .add_image(
        ImageEntry::new("https://example.com/images/photo1.jpg")
            .title("Beautiful Sunset")
            .caption("A stunning sunset over the ocean")
    );

    builder.add_url(url_with_images);

    let xml = builder.build()?;
    builder.write("image_sitemap.xml")?;

    Ok(())
}

Video Sitemap

use sitemap_generator::{VideoSitemapBuilder, UrlEntry, UrlWithVideos, VideoEntry};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut builder = VideoSitemapBuilder::new();

    let url_with_video = UrlWithVideos::new(
        UrlEntry::new("https://example.com/videos/intro")
    )
    .add_video(
        VideoEntry::new(
            "https://example.com/thumbnails/intro.jpg",
            "Introduction to Our Product",
            "Learn about our product in this video"
        )
        .content_loc("https://example.com/videos/intro.mp4")
        .duration(300)
        .rating(4.5)
        .family_friendly(true)
        .add_tag("tutorial")
    );

    builder.add_url(url_with_video);

    let xml = builder.build()?;
    builder.write("video_sitemap.xml")?;

    Ok(())
}

News Sitemap

use sitemap_generator::{NewsSitemapBuilder, UrlEntry, UrlWithNews, NewsEntry, NewsPublication};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut builder = NewsSitemapBuilder::new();

    let publication = NewsPublication::new("TechDaily", "en");
    let news = NewsEntry::new(
        publication,
        "2025-11-01T10:00:00Z",
        "Revolutionary AI Breakthrough Announced"
    )
    .keywords("AI, technology, machine learning")
    .stock_tickers("GOOGL, MSFT");

    let url = UrlEntry::new("https://example.com/tech/ai-breakthrough");
    builder.add_url(UrlWithNews::new(url, news));

    let xml = builder.build()?;
    builder.write("news_sitemap.xml")?;

    Ok(())
}

Combined Sitemap (Multiple Extensions)

Combine image, video, and news extensions in a single sitemap:

use sitemap_generator::{
    CombinedSitemapBuilder, UrlEntry, UrlWithExtensions,
    ImageEntry, VideoEntry, NewsEntry, NewsPublication
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut builder = CombinedSitemapBuilder::new();

    // News article with images and video
    let news = NewsEntry::new(
        NewsPublication::new("TechDaily", "en"),
        "2025-11-01T10:00:00Z",
        "AI Breakthrough Announced"
    );

    let image = ImageEntry::new("https://example.com/ai-lab.jpg")
        .title("AI Research Lab");

    let video = VideoEntry::new(
        "https://example.com/thumb.jpg",
        "AI Demo Video",
        "Watch the demonstration"
    )
    .content_loc("https://example.com/video.mp4")
    .duration(300);

    builder.add_url(
        UrlWithExtensions::new(UrlEntry::new("https://example.com/article"))
            .add_image(image)
            .add_video(video)
            .set_news(news)
    );

    let xml = builder.build()?;
    builder.write("combined_sitemap.xml")?;

    Ok(())
}

Sitemap Index

use sitemap_generator::{SitemapIndexBuilder, SitemapIndexEntry};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut builder = SitemapIndexBuilder::new();

    builder.add_sitemap(
        SitemapIndexEntry::new("https://example.com/sitemap1.xml.gz")
            .lastmod("2025-11-01")
    );

    builder.add_sitemap(
        SitemapIndexEntry::new("https://example.com/sitemap2.xml.gz")
            .lastmod("2025-11-02")
    );

    let xml = builder.build()?;
    builder.write("sitemap_index.xml")?;

    Ok(())
}

Web Framework Integration

Use with Axum, Actix-web, Rocket, or other web frameworks:

Axum Example

use axum::{
    response::{Response, IntoResponse},
    http::{StatusCode, header},
};
use sitemap_generator::{SitemapBuilder, UrlEntry};

async fn sitemap() -> impl IntoResponse {
    let mut builder = SitemapBuilder::new();
    builder.add_url(UrlEntry::new("https://example.com/"));

    match builder.build_bytes() {
        Ok(bytes) => Response::builder()
            .status(StatusCode::OK)
            .header(header::CONTENT_TYPE, "application/xml; charset=utf-8")
            .body(bytes.into())
            .unwrap(),
        Err(_) => Response::builder()
            .status(StatusCode::INTERNAL_SERVER_ERROR)
            .body(Vec::new().into())
            .unwrap(),
    }
}

// With compression (recommended):
async fn sitemap_compressed() -> impl IntoResponse {
    let mut builder = SitemapBuilder::new();
    builder.add_url(UrlEntry::new("https://example.com/"));

    match builder.build_compressed_bytes() {
        Ok(bytes) => Response::builder()
            .status(StatusCode::OK)
            .header(header::CONTENT_TYPE, "application/xml")
            .header(header::CONTENT_ENCODING, "gzip")
            .body(bytes.into())
            .unwrap(),
        Err(_) => Response::builder()
            .status(StatusCode::INTERNAL_SERVER_ERROR)
            .body(Vec::new().into())
            .unwrap(),
    }
}

Actix-web Example

use actix_web::{web, HttpResponse, Result};
use sitemap_generator::{SitemapBuilder, UrlEntry};

async fn sitemap() -> Result<HttpResponse> {
    let mut builder = SitemapBuilder::new();
    builder.add_url(UrlEntry::new("https://example.com/"));

    match builder.build_compressed_bytes() {
        Ok(bytes) => Ok(HttpResponse::Ok()
            .content_type("application/xml")
            .insert_header(("Content-Encoding", "gzip"))
            .body(bytes)),
        Err(_) => Ok(HttpResponse::InternalServerError().finish()),
    }
}

See examples/web_framework_usage.rs for more examples.

Parsing Sitemaps

use sitemap_generator::SitemapParser;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse from file
    let entries = SitemapParser::parse_file("sitemap.xml")?;

    // Parse compressed file
    let entries = SitemapParser::parse_compressed("sitemap.xml.gz")?;

    // Parse from string
    let xml = r#"<?xml version="1.0"?>
        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
            <url>
                <loc>https://example.com/</loc>
                <lastmod>2025-11-01</lastmod>
            </url>
        </urlset>"#;
    let entries = SitemapParser::parse_string(xml)?;

    for entry in entries {
        println!("URL: {}", entry.loc);
    }

    Ok(())
}

Performance

This library is designed for high performance and low memory usage:

  • Fast XML Generation: Uses quick-xml for efficient XML writing (~830K URLs/second)
  • Memory Efficient: ~140 bytes per URL during generation, 0 bytes after (proven with custom allocator)
  • Optimized Allocation: Use with_capacity() to pre-allocate memory and reduce reallocations
  • Immediate Cleanup: Memory released immediately when builder drops (RAII-based)
  • Zero Memory Leaks: Empirically proven - 1 byte growth across 100 iterations
  • Streaming Compression: Gzip compression with 50x+ compression ratios (98% bandwidth saved)

Performance Tips

Use with_capacity() for better performance:

// ❌ Without capacity (Vec grows dynamically)
let mut builder = SitemapBuilder::new();
for i in 0..10_000 {
    builder.add_url(UrlEntry::new(format!("https://example.com/page{}", i)));
}

// ✅ With capacity (Vec pre-allocated, ~2-5% faster)
let mut builder = SitemapBuilder::with_capacity(10_000);
for i in 0..10_000 {
    builder.add_url(UrlEntry::new(format!("https://example.com/page{}", i)));
}

Available for all builders:

  • SitemapBuilder::with_capacity(10_000)
  • ImageSitemapBuilder::with_capacity(5_000)
  • VideoSitemapBuilder::with_capacity(1_000)
  • NewsSitemapBuilder::with_capacity(1_000)
  • CombinedSitemapBuilder::with_capacity(500)
  • SitemapIndexBuilder::with_capacity(50)

Benchmarks

Operation 10 URLs 100 URLs 1,000 URLs 10,000 URLs
Standard Sitemap 12.5 µs 120 µs 1.2 ms 12.2 ms
Build Bytes 5.7 µs 51 µs 519 µs 5.2 ms
Build Compressed 25 µs 180 µs 1.8 ms 18 ms
Image Sitemap (2 imgs) 28 µs 275 µs 2.7 ms -
Video Sitemap 45 µs 450 µs 4.5 ms -

Compression Ratios: 31x (100 URLs) to 54x (10,000 URLs) - saves 96-98% bandwidth

Proven Results:

Documentation:

Run benchmarks yourself:

# Performance benchmarks
cargo bench
open target/criterion/report/index.html

# Memory safety demo
cargo run --example memory_demo --release

Validation

The library automatically validates:

  • URL format (RFC 3986)
  • Maximum 50,000 URLs per standard sitemap (1,000 for news sitemaps)
  • Maximum 50MB uncompressed size
  • URL length (max 2048 characters)
  • Date format (W3C Datetime)
  • Priority values (0.0 to 1.0)
  • Video duration (max 28,800 seconds)
  • Video rating (0.0 to 5.0)
  • Video title length (max 100 characters)
  • Video description length (max 2048 characters)
  • News language codes (ISO 639 format)
  • Stock tickers (max 5, comma-separated)

You can disable validation if needed:

let builder = SitemapBuilder::new().validate(false);

// Or with capacity
let builder = SitemapBuilder::with_capacity(10_000).validate(false);

Sitemap Protocol Compliance

This library follows the sitemaps.org protocol 0.9 specification:

  • Standard sitemap with <urlset> and <url> elements
  • Optional <lastmod>, <changefreq>, and <priority> elements
  • Image extension: xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
  • Video extension: xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
  • News extension: xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
  • Sitemap index with <sitemapindex> and <sitemap> elements

Examples

See the examples directory for more complete examples:

Run examples with:

cargo run --example basic_sitemap
cargo run --example image_sitemap
cargo run --example video_sitemap
cargo run --example news_sitemap
cargo run --example combined_sitemap
cargo run --example sitemap_index
cargo run --example web_framework_usage
cargo run --example memory_demo --release

Testing

Run the test suite:

cargo test

License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

References

Commit count: 0

cargo fmt