| Crates.io | omnivore-cli |
| lib.rs | omnivore-cli |
| version | 0.2.0 |
| created_at | 2025-08-09 20:15:23.160468+00 |
| updated_at | 2025-08-11 03:21:34.486247+00 |
| description | Universal web scraper and code extractor CLI - crawl websites, analyze repositories, build knowledge graphs |
| homepage | https://ov.pranavkarra.me |
| repository | https://github.com/Pranav-Karra-3301/omnivore |
| max_upload_size | |
| id | 1788211 |
| size | 286,406 |
🕷️ Omnivore - The Universal Web Scraper & Code Extractor
A powerful command-line tool for web scraping, code analysis, and knowledge extraction. Omnivore can crawl websites, extract code from Git repositories, and build comprehensive knowledge graphs from the data it collects.
📚 Full Documentation: ov.pranavkarra.me/docs
🌐 Website: ov.pranavkarra.me
cargo install omnivore-cli
git clone https://github.com/Pranav-Karra-3301/omnivore.git
cd omnivore
cargo install --path omnivore-cli
# Basic crawl with 5 workers and depth of 3
omnivore crawl https://example.com --workers 5 --depth 3
# Crawl with AI extraction
omnivore crawl https://example.com --ai-extract --ai-model gpt-4o-mini
# JavaScript-rendered content with browser mode
omnivore crawl https://example.com --browser --wait 2000
# Analyze a GitHub repository
omnivore git https://github.com/user/repo --output repo-analysis.txt
# Analyze local directory (even non-git directories)
omnivore git ./my-project --output project-code.txt
# Filter by file types
omnivore git ./my-project --only rs,toml,md --output rust-project.txt
# Exclude patterns
omnivore git ./my-project --exclude "test,*.log,tmp/*" --output filtered-code.txt
# Interactive setup wizard
omnivore setup
# Set OpenAI API key for AI features
omnivore config set openai_api_key YOUR_API_KEY
# View current configuration
omnivore config show
omnivore crawlCrawl websites with advanced options:
omnivore gitExtract and analyze code repositories:
omnivore configManage configuration:
# Use GPT-4 for intelligent content extraction
omnivore crawl https://docs.example.com \
--ai-extract \
--ai-model gpt-4o \
--ai-prompt "Extract API endpoints and their descriptions"
# Crawl JavaScript-heavy sites
omnivore crawl https://app.example.com \
--browser \
--wait 3000 \
--screenshot \
--interactive
# Generate a knowledge graph from crawled data
omnivore crawl https://wiki.example.com --output wiki.json
omnivore graph wiki.json --output knowledge-graph.db
# Omnivore automatically detects project type and applies smart filters
omnivore git https://github.com/vercel/next.js --output nextjs.txt
# Override smart defaults
omnivore git ./my-project --no-smart-filter --include "*.js,*.ts"
Create a .omnivore.toml file in your project or home directory:
[crawl]
default_workers = 10
default_depth = 3
respect_robots_txt = true
user_agent = "Omnivore/1.0"
[git]
smart_filter = true
max_file_size = 10485760 # 10MB
exclude_binary = true
[ai]
default_model = "gpt-4o-mini"
max_tokens = 2000
omnivore crawl https://docs.rust-lang.org \
--depth 3 \
--include-pattern "/std/*" \
--ai-extract \
--ai-prompt "Extract function signatures and descriptions" \
--output rust-std-api.json
omnivore git ./my-typescript-project \
--only ts,tsx,json \
--exclude "node_modules,dist,*.test.ts" \
--output project-analysis.txt
# Step 1: Crawl the website
omnivore crawl https://en.wikipedia.org/wiki/Artificial_intelligence \
--depth 2 \
--max-pages 100 \
--output ai-wiki.json
# Step 2: Generate knowledge graph
omnivore graph ai-wiki.json \
--output ai-knowledge.db \
--extract-entities \
--extract-relationships
OMNIVORE_CONFIG_DIR: Custom configuration directoryOPENAI_API_KEY: OpenAI API key for AI featuresOMNIVORE_USER_AGENT: Custom user agent for crawlingOMNIVORE_LOG_LEVEL: Logging level (debug, info, warn, error)For comprehensive documentation, examples, and guides, visit:
Contributions are welcome! Please check out our contributing guidelines.
This project is dual-licensed under either:
at your option.