| Crates.io | scrape_blogger |
| lib.rs | scrape_blogger |
| version | 0.1.3 |
| created_at | 2024-11-16 16:12:48.698852+00 |
| updated_at | 2024-11-21 22:56:54.014596+00 |
| description | A CLI to scrape content from a Blogger Site |
| homepage | |
| repository | https://github.com/harr1424/scrape_blogger |
| max_upload_size | |
| id | 1450457 |
| size | 85,572 |
Usage: scrape_blogger [OPTIONS]
Options:
-t, --threads <THREADS> Sets the number of threads to use when scraping all post links [default: 4]
-r, --recent-only Scrapes only recent posts from the blog homepage without clicking 'Older Posts'
-h, --help Print help
-V, --version Print version
Recurisvely crawl and scrape a specific Blogger site in order to archive post content. This project may not generalize well to all Blogger sites. It is hardcoded to work with a specific site, but the source code may be modified to work with any English Blogger site where the site's homepage has a link to older posts.