rss-miner

Crates.iorss-miner
lib.rsrss-miner
version0.1.2
created_at2025-12-26 18:18:11.607903+00
updated_at2026-01-14 15:09:57.62056+00
descriptionCLI tool that finds RSS/Atom feeds from URLs and generates OPML.
homepagehttps://github.com/RustedBytes/rss-miner
repositoryhttps://github.com/RustedBytes/rss-miner
max_upload_size
id2006044
size122,232
dev-team (github:iron:dev-team)

documentation

README

rss-miner

Crates.io Version MIT licensed

A CLI tool that finds RSS feeds from URLs and generates a valid OPML file.

Features

  • Parallel Processing: Uses Rayon to process multiple URLs concurrently
  • RSS Feed Validation: Validates RSS/Atom feeds before including them
  • OPML Generation: Creates a valid OPML file compatible with feed readers
  • Auto-Discovery: Finds RSS feeds in HTML link tags and common feed paths
  • Error Handling: Robust error handling with detailed feedback

Installation

cargo build --release

Usage

rss-miner --input <INPUT_FILE> [--output <OUTPUT_FILE>]

Arguments

  • -i, --input <FILE>: Input file containing URLs (one per line, required)
  • -o, --output <FILE>: Output OPML file path (default: feeds.opml)

Example

Create a file urls.txt with URLs:

https://github.blog
https://stackoverflow.blog
https://www.rust-lang.org/

Run the command:

cargo run -- --input urls.txt --output feeds.opml

Or use the compiled binary:

./target/release/rss-miner --input urls.txt --output feeds.opml

Input File Format

  • One URL per line
  • Lines starting with # are treated as comments and ignored
  • Empty lines are ignored

Example:

# Tech blogs
https://github.blog
https://stackoverflow.blog

# Programming languages
https://www.rust-lang.org/
https://go.dev/

How It Works

  1. Reads URLs: Parses the input file to extract URLs
  2. Parallel Processing: Uses Rayon to process multiple URLs simultaneously
  3. Feed Discovery: For each URL:
    • Fetches the HTML page
    • Looks for RSS/Atom feed links in the HTML
    • Checks common RSS feed paths (/feed, /rss, /feed.xml, etc.)
  4. Validation: Validates each discovered feed by:
    • Attempting to fetch the feed
    • Parsing it as RSS or Atom format
  5. OPML Generation: Creates a valid OPML file with all discovered and validated feeds

Dependencies

  • clap: Command-line argument parsing
  • rayon: Parallel processing
  • reqwest: HTTP client
  • scraper: HTML parsing
  • opml: OPML file generation
  • rss: RSS feed parsing and validation
  • atom_syndication: Atom feed parsing and validation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Commit count: 49

cargo fmt