scrape_blogger

Crates.io	scrape_blogger
lib.rs	scrape_blogger
version	0.1.3
created_at	2024-11-16 16:12:48.698852+00
updated_at	2024-11-21 22:56:54.014596+00
description	A CLI to scrape content from a Blogger Site
homepage
repository	https://github.com/harr1424/scrape_blogger
max_upload_size
id	1450457
size	85,572

John Harrington (harr1424)

documentation

README

scrape_blogger

Usage: scrape_blogger [OPTIONS]

Options:
  -t, --threads <THREADS>  Sets the number of threads to use when scraping all post links [default: 4]
  -r, --recent-only        Scrapes only recent posts from the blog homepage without clicking 'Older Posts'
  -h, --help               Print help
  -V, --version            Print version

Recurisvely crawl and scrape a specific Blogger site in order to archive post content. This project may not generalize well to all Blogger sites. It is hardcoded to work with a specific site, but the source code may be modified to work with any English Blogger site where the site's homepage has a link to older posts.

Commit count: 26

scrape_blogger

documentation

README

scrape_blogger

cargo fmt