| Crates.io | domain_status |
| lib.rs | domain_status |
| version | 0.1.9 |
| created_at | 2025-12-07 05:48:18.910978+00 |
| updated_at | 2025-12-19 15:46:45.17197+00 |
| description | Concurrent URL status checker that captures comprehensive metadata in SQLite. |
| homepage | https://github.com/alexwoolford/domain_status |
| repository | https://github.com/alexwoolford/domain_status |
| max_upload_size | |
| id | 1971217 |
| size | 2,079,262 |
domain_status is a fast, concurrent website scanner for bulk analysis of URLs and domains.
Give it a list of URLs → it fetches HTTP status, TLS certificates, DNS records, WHOIS data, GeoIP information, and technology fingerprints in one pass → stores everything in SQLite for analysis.
Who it's for:
Why domain_status? Unlike single-purpose tools (curl for status, whois for domain info, Wappalyzer for tech detection), domain_status consolidates all checks in one tool. Built with async Rust (Tokio) for high-performance concurrent processing, it efficiently handles hundreds or thousands of URLs while maintaining reliability through adaptive rate limiting and comprehensive error handling.
Install and run in 3 commands:
# 1. Install (requires Rust 1.85+)
cargo install domain_status
# 2. Create URLs file and run scan
echo -e "https://example.com\nhttps://rust-lang.org" > urls.txt && domain_status scan urls.txt
# 3. View results
sqlite3 domain_status.db "SELECT domain, status, title FROM url_status;"
Optional: Enable GeoIP lookup
If you want GeoIP data (country, city, etc.), create a .env file:
# Copy the example and add your MaxMind license key
cp .env.example .env
# Edit .env and add: MAXMIND_LICENSE_KEY=your_key_here
Get a free MaxMind license key from: https://www.maxmind.com/en/accounts/current/license-key
Example output:
domain | status | title
------------------|--------|--------------------------
example.com | 200 | Example Domain
rust-lang.org | 200 | Rust Programming Language
That's it! The tool processes URLs concurrently (30 by default), stores all data in SQLite, and provides progress updates.
Alternative: Don't have Rust? Download a pre-built binary from the Releases page - see Installation for details.
Option 1: Install via Cargo (Recommended for Rust users)
Requires Rust 1.85 or newer:
cargo install domain_status
This compiles from source and installs the binary to ~/.cargo/bin/domain_status (or %USERPROFILE%\.cargo\bin\domain_status.exe on Windows). The binary is added to your PATH automatically.
Benefits:
cargo install --force domain_statusNote: This crate requires Rust 1.85 or newer (for edition 2024 support in dependencies). If installation fails, update your Rust toolchain: rustup update stable.
Option 2: Download Pre-built Binary
Download the latest release from the Releases page:
# Linux (x86_64)
wget https://github.com/alexwoolford/domain_status/releases/latest/download/domain_status-linux-x86_64.tar.gz
tar xzf domain_status-linux-x86_64.tar.gz
chmod +x domain_status
./domain_status scan urls.txt
# macOS (Intel)
wget https://github.com/alexwoolford/domain_status/releases/latest/download/domain_status-macos-x86_64.tar.gz
tar xzf domain_status-macos-x86_64.tar.gz
chmod +x domain_status
# macOS (Apple Silicon)
wget https://github.com/alexwoolford/domain_status/releases/latest/download/domain_status-macos-aarch64.tar.gz
tar xzf domain_status-macos-aarch64.tar.gz
chmod +x domain_status
# macOS: Handle Gatekeeper warning (unsigned binary)
# Option 1: Right-click the binary, select "Open", then click "Open" in the dialog
# Option 2: Run this command to remove the quarantine attribute:
xattr -d com.apple.quarantine domain_status 2>/dev/null || true
./domain_status scan urls.txt
# Windows
# Download domain_status-windows-x86_64.exe.zip and extract
Option 3: Build from Source
Requires Rust 1.85 or newer:
# Clone the repository
git clone https://github.com/alexwoolford/domain_status.git
cd domain_status
# Build release binary
cargo build --release
This creates an executable in ./target/release/domain_status (or domain_status.exe on Windows).
Note: SQLite is bundled in the binary - no system SQLite installation required. The tool is completely self-contained.
Domain Portfolio Management: Check status of multiple domains, track redirects, verify SSL certificates, monitor domain expiration dates (with WHOIS enabled).
Security Audits: Identify missing security headers (CSP, HSTS, etc.), detect expired certificates, inventory technology stacks to identify potential vulnerabilities.
Competitive Analysis: Track technology stacks across competitors, identify analytics tools and tracking IDs, gather structured data (Open Graph, JSON-LD) for comparison.
Monitoring: Integrate with Prometheus for ongoing status checks via the status server endpoint, track changes over time by querying run history.
Research: Bulk analysis of web technologies, DNS configurations, geographic distribution of infrastructure, technology adoption patterns.
Unlike single-purpose tools (curl, nmap, whois), domain_status consolidates many checks in one sweep, ensuring consistency and saving time.
The tool uses a subcommand-based interface:
domain_status scan <file> - Scan URLs and store results in SQLite database
- as filename to read URLs from stdin: echo "https://example.com" | domain_status scan -domain_status export - Export data from SQLite database to various formats (CSV, JSONL, Parquet)Usage:
domain_status scan <file> [OPTIONS]
Common Options:
--log-level <LEVEL>: Log level: error, warn, info, debug, or trace (default: info)--log-file <PATH>: Log file path (default: domain_status.log). All logs are written to this file with timestamps.--db-path <PATH>: SQLite database file path (default: ./domain_status.db)--max-concurrency <N>: Maximum concurrent requests (default: 30)--timeout-seconds <N>: HTTP client timeout in seconds (default: 10). Note: Per-URL processing timeout is 35 seconds.--rate-limit-rps <N>: Initial requests per second (adaptive rate limiting always enabled, default: 15)--status-port <PORT>: Start HTTP status server on the specified port (optional, disabled by default)--fail-on <POLICY>: Exit code policy for CI integration: never (default), any-failure, pct>X, or errors-only. See Exit Code Control for details.Advanced Options:
--user-agent <STRING>: HTTP User-Agent header value (default: Chrome user agent)--fingerprints <URL|PATH>: Technology fingerprint ruleset source (URL or local path). Default: HTTP Archive Wappalyzer fork. Rules are cached locally for 7 days.--geoip <PATH|URL>: GeoIP database path (MaxMind GeoLite2 .mmdb file) or download URL. If not provided, will auto-download if MAXMIND_LICENSE_KEY environment variable is set.--enable-whois: Enable WHOIS/RDAP lookup for domain registration information. WHOIS data is cached for 7 days. Default: disabled.Example:
domain_status scan urls.txt \
--db-path ./results.db \
--max-concurrency 100 \
--timeout-seconds 15 \
--log-level debug \
--rate-limit-rps 20 \
--status-port 8080
Exit Code Control (--fail-on):
The --fail-on option controls when the scan command exits with a non-zero code, making it ideal for CI/CD pipelines:
never (default): Always return exit code 0, even if some URLs failed. Useful for monitoring scenarios where you want to log failures but not trigger alerts.any-failure: Exit with code 2 if any URL failed. Strict mode for CI pipelines where any failure should be treated as a build failure.pct>X: Exit with code 2 if failure percentage exceeds X (e.g., pct>10 means exit if more than 10% failed). Use with --fail-on-pct-threshold to set the exact percentage. Useful for large scans where some failures are expected.errors-only: Exit only on critical errors (timeouts, DNS failures, etc.). Currently behaves like any-failure (future enhancement).Exit Codes:
0: Success (or failures ignored by policy)1: Configuration error or scan initialization failure2: Failures exceeded threshold (based on --fail-on policy)3: Partial success (some URLs processed, but scan incomplete)Examples:
# CI mode: fail if any URL fails
domain_status scan urls.txt --fail-on any-failure
# Allow up to 10% failures before failing
domain_status scan urls.txt --fail-on pct>10 --fail-on-pct-threshold 10
# Monitoring mode: always succeed (default)
domain_status scan urls.txt --fail-on never
Usage:
domain_status export [OPTIONS]
Options:
--db-path <PATH>: SQLite database file path (default: ./domain_status.db)--format <FORMAT>: Export format: csv, jsonl, or parquet (default: csv)
jq, or loading into databases)--output <PATH>: Output file path
domain_status_export.{csv,jsonl,parquet} in the current directory- to write to stdout (for piping to other commands)--run-id <ID>: Filter by run ID--domain <DOMAIN>: Filter by domain (matches initial or final domain)--status <CODE>: Filter by HTTP status code--since <TIMESTAMP>: Filter by timestamp (milliseconds since epoch)Examples:
# Export all data to CSV (defaults to domain_status_export.csv)
domain_status export --format csv
# Export to a custom file
domain_status export --format csv --output results.csv
# Export to stdout (pipe to another command)
domain_status export --format jsonl --output - 2>/dev/null | jq '.final_domain'
# Export only successful URLs (status 200)
domain_status export --format csv --status 200 --output successful.csv
# Pipe JSONL to jq for filtering (log messages go to stderr automatically)
domain_status export --format jsonl --output - 2>/dev/null | jq 'select(.status == 200) | .final_domain'
# Export and filter with jq (e.g., get domains with specific technologies)
domain_status export --format jsonl --output - 2>/dev/null | jq 'select(.technologies[].name == "WordPress") | .final_domain'
Environment variables can be set in a .env file (in the current directory or next to the executable) or exported in your shell.
Configuration Precedence (highest to lowest):
--db-path, --geoip) - always take precedence.env file or shell) - used when CLI args not providedAvailable Environment Variables:
MAXMIND_LICENSE_KEY: MaxMind license key for automatic GeoIP database downloads. Get a free key from MaxMind. If not set, GeoIP lookup is disabled and the application continues normally.DOMAIN_STATUS_DB_PATH: Override default database path. Note: CLI argument --db-path takes precedence over this variable.GITHUB_TOKEN: (Optional) GitHub personal access token for fingerprint ruleset downloads. Increases GitHub API rate limit from 60 to 5000 requests/hour. Only needed if using GitHub-hosted fingerprint rulesets.RUST_LOG: (Optional) Advanced logging control. Overrides --log-level CLI argument if set. Format: domain_status=debug,reqwest=info. See env_logger documentation for details.Example .env file:
# Copy .env.example to .env and customize
MAXMIND_LICENSE_KEY=your_license_key_here
DOMAIN_STATUS_DB_PATH=./my_database.db
GITHUB_TOKEN=your_github_token_here
Input File:
http:// or https:// prefixhttps:// is automatically prependedhttp:// and https:// URLs are accepted; other schemes are rejected# are treated as comments and ignored- as the filename to read URLs from stdin:
echo -e "https://example.com\nhttps://rust-lang.org" | domain_status scan -
cat urls.txt | domain_status scan -
Example input file:
# My domain list
# Production domains
https://example.com
https://www.example.com
# Staging domains
https://staging.example.com
After a scan completes, all data is stored in the SQLite database. Use domain_status export to export data in CSV/JSON format, or query the database directly.
The database uses a UNIQUE (final_domain, timestamp) constraint to ensure idempotency. This means:
https://example.com/ and https://example.com/page, both will be processed, but they will resolve to the same final_domain after following redirects. The database stores the final domain after redirects, so only one record per final domain per timestamp is kept.Best practice: Include each domain only once per input file. If you need to check multiple paths on the same domain, they should be separate URLs (e.g., https://example.com/ and https://example.com/about), but be aware that redirects may cause them to resolve to the same final domain.
The tool shows a clean progress bar during scanning, with detailed logs written to a file:
📝 Logs: domain_status.log
⠋ [00:00:45] [████████████████████░░░░░░░░░░░░░░░░░░░░] 52/100 (52%) ✓48 ✗4
After completion:
✅ Processed 100 URLs (92 succeeded, 8 failed) in 55.9s - see database for details
Results saved in ./domain_status.db
💡 Tip: Use `domain_status export --format csv` to export data, or query the database directly.
Log file format (with timestamps):
[2025-01-07 23:33:59.123] INFO domain_status::run - Total URLs in file: 100
[2025-01-07 23:33:59.456] INFO domain_status::fingerprint::ruleset - Merged 7223 technologies from 2 source(s)
[2025-01-07 23:34:01.789] WARN domain_status::dns::resolution - Failed to perform reverse DNS lookup...
Performance Analysis:
Detailed timing metrics are automatically logged to the log file (domain_status.log by default) at the end of each scan. This includes a breakdown of time spent in each operation:
Example log output:
=== Timing Metrics Summary (88 URLs) ===
Average times per URL:
HTTP Request: 1287 ms (40.9%)
DNS Forward: 845 ms (26.8%)
TLS Handshake: 1035 ms (32.9%)
HTML Parsing: 36 ms (1.1%)
Tech Detection: 1788 ms (56.8%)
Total: 3148 ms
Note: Performance varies significantly based on rate limiting, network conditions, target server behavior, and error handling. Expect 0.5-2 lines/sec with default settings. Higher rates may trigger bot detection.
All results are stored in the SQLite database. You can query the database while the scan is running (WAL mode allows concurrent reads). Here are some useful queries:
Basic status overview:
SELECT domain, status, status_description, response_time
FROM url_status
ORDER BY domain;
Find all failed URLs:
SELECT domain, status, status_description
FROM url_status
WHERE status >= 400 OR status = 0
ORDER BY status;
Find all sites using a specific technology:
SELECT DISTINCT us.domain, us.status
FROM url_status us
JOIN url_technologies ut ON us.id = ut.url_status_id
WHERE ut.technology_name = 'WordPress'
ORDER BY us.domain;
Find sites with missing security headers:
SELECT DISTINCT us.domain
FROM url_status us
JOIN url_security_warnings usw ON us.id = usw.url_status_id
WHERE usw.warning_code LIKE '%missing%'
ORDER BY us.domain;
Find all redirects:
SELECT
us.domain,
us.final_domain,
us.status,
COUNT(urc.id) as redirect_count
FROM url_status us
LEFT JOIN url_redirect_chain urc ON us.id = urc.url_status_id
GROUP BY us.id, us.domain, us.final_domain, us.status
HAVING redirect_count > 0
ORDER BY redirect_count DESC;
Compare runs by version:
SELECT version, COUNT(*) as runs,
SUM(total_urls) as total_urls,
AVG(elapsed_seconds) as avg_time
FROM runs
WHERE end_time IS NOT NULL
GROUP BY version
ORDER BY version DESC;
Get all URLs from a specific run:
SELECT domain, status, title, response_time
FROM url_status
WHERE run_id = 'run_1765150444953'
ORDER BY domain;
The database uses a star schema design pattern with:
url_status (main URL data)runs (run-level metadata including version)url_geoip, url_whoisurl_failures with related tables for error context (redirect chains, request/response headers)Key Features:
UNIQUE (final_domain, timestamp) ensures idempotencyFor complete database schema documentation including entity-relationship diagrams, table descriptions, indexes, constraints, and query examples, see DATABASE.md.
All scan results are persisted in the database, so you can query past runs even after closing the terminal. The runs table stores summary statistics for each scan:
-- View all completed runs (most recent first)
SELECT
run_id,
version,
datetime(start_time/1000, 'unixepoch') as start_time,
datetime(end_time/1000, 'unixepoch') as end_time,
elapsed_seconds,
total_urls,
successful_urls,
failed_urls,
ROUND(100.0 * successful_urls / total_urls, 1) as success_rate
FROM runs
WHERE end_time IS NOT NULL
ORDER BY start_time DESC
LIMIT 10;
Example output:
run_id | version | start_time | end_time | elapsed_seconds | total_urls | successful_urls | failed_urls | success_rate
--------------------|---------|---------------------|---------------------|-----------------|------------|-----------------|------------|--------------
run_1765150444953 | 0.1.4 | 2025-01-07 23:33:59 | 2025-01-07 23:34:52 | 52.1 | 100 | 89 | 11 | 89.0
Using the library API:
use domain_status::storage::query_run_history;
use sqlx::SqlitePool;
let pool = SqlitePool::connect("sqlite:./domain_status.db").await?;
let runs = query_run_history(&pool, Some(10)).await?;
for run in runs {
println!("Run {}: {} URLs ({} succeeded, {} failed) in {:.1}s",
run.run_id, run.total_urls, run.successful_urls,
run.failed_urls, run.elapsed_seconds.unwrap_or(0.0));
}
For long-running jobs, you can monitor progress via an optional HTTP status server:
# Start with status server on port 8080
domain_status scan urls.txt --status-port 8080
curl http://127.0.0.1:8080/status | jq
curl http://127.0.0.1:8080/metrics
The status server provides:
- **Real-time progress**: Total URLs, completed, failed, percentage complete, processing rate
- **Error breakdown**: Detailed counts by error type
- **Warning/info metrics**: Track missing metadata, redirects, bot detection events
- **Prometheus compatibility**: Metrics endpoint ready for Prometheus scraping
### Status Endpoint (`/status`)
Returns detailed JSON status with real-time progress information:
```bash
curl http://127.0.0.1:8080/status | jq
Response Format:
{
"total_urls": 100,
"completed_urls": 85,
"failed_urls": 2,
"pending_urls": 13,
"percentage_complete": 87.0,
"elapsed_seconds": 55.88,
"rate_per_second": 1.52,
"errors": { "total": 17, "timeout": 0, "connection_error": 0, "http_error": 3, "dns_error": 14, "tls_error": 0, "parse_error": 0, "other_error": 0 },
"warnings": { "total": 104, "missing_meta_keywords": 77, "missing_meta_description": 25, "missing_title": 2 },
"info": { "total": 64, "http_redirect": 55, "https_redirect": 0, "bot_detection_403": 3, "multiple_redirects": 6 }
}
/metrics)Returns Prometheus-compatible metrics in text format:
curl http://127.0.0.1:8080/metrics
Metrics:
domain_status_total_urls (gauge): Total URLs to processdomain_status_completed_urls (gauge): Successfully processed URLsdomain_status_failed_urls (gauge): Failed URLsdomain_status_percentage_complete (gauge): Completion percentage (0-100)domain_status_rate_per_second (gauge): Processing rate (URLs/sec)domain_status_errors_total (counter): Total error countdomain_status_warnings_total (counter): Total warning countdomain_status_info_total (counter): Total info event countPrometheus Integration:
scrape_configs:
- job_name: 'domain_status'
static_configs:
- targets: ['localhost:8080']
To enable GeoIP, set the MAXMIND_LICENSE_KEY environment variable and the tool will automatically download the MaxMind GeoLite2 databases on first run:
export MAXMIND_LICENSE_KEY=your_license_key_here
domain_status scan urls.txt
The databases are cached in .geoip_cache/ and reused for subsequent runs. Alternatively, download the .mmdb files yourself and use --geoip to point to them. GeoIP data is stored in the url_geoip table with fields for country, region, city, coordinates, and ASN.
If GeoIP fails or no key is provided, the tool safely skips GeoIP lookup with a warning and continues normally.
The --enable-whois flag performs WHOIS/RDAP queries to fetch domain registration information. This significantly slows down processing (adds approximately 1 second per domain) due to rate limits imposed by registrars.
Rate Limiting: WHOIS queries are rate-limited to 0.5 queries/second (1 query per 2 seconds) to respect registrar limits. This is separate from HTTP rate limiting.
Caching: WHOIS data is cached in .whois_cache/ by domain name for 7 days to avoid redundant queries.
Limitations: Not all TLDs provide public WHOIS via port 43, and some registrars limit the data returned. RDAP fallback helps but is not universal. If a WHOIS server blocks you, you may see warnings in the logs.
Enable this flag only when you need registrar/expiration information. For faster scans, leave it disabled (default).
Technology detection uses pattern matching against:
__NEXT_DATA__ for Next.js)Important: The tool does NOT execute JavaScript or fetch external scripts. It only analyzes the initial HTML response, matching WappalyzerGo's behavior.
The default fingerprint ruleset comes from the HTTP Archive Wappalyzer fork and is cached locally for 7 days. You can update to the latest by pointing --fingerprints to a new JSON file (e.g., the official Wappalyzer technologies.json). The tool prints the fingerprints source and version (commit hash) in the runs table.
If you maintain your own fingerprint file (e.g., for internal technologies), you can use that too.
Concurrency: The default is 30 concurrent requests. If you have good bandwidth and target sites can handle it, you can increase --max-concurrency. Monitor the /metrics endpoint's rate to see actual throughput. Conversely, if you encounter many timeouts or want to be gentle on servers, lower concurrency.
Rate Limiting: The default is 15 RPS with adaptive adjustment. The adaptive rate limiter:
Memory: Each concurrent task consumes memory for HTML and data. With default settings, memory usage is moderate. If scanning extremely large pages, consider that response bodies are capped at 2MB and HTML text extraction is limited to 50KB.
The tool automatically retries failed HTTP requests up to 2 additional times (3 total attempts) with exponential backoff (initial: 500ms, max: 15s). If a domain consistently fails (e.g., DNS not resolved, or all attempts timed out), it will be marked in the url_failures table with details. The errors section of the status output counts these. You don't need to re-run for transient errors; they are retried on the fly.
Scan is very slow or stuck:
domain_status.log by default) or use --log-level debug for more detail.I see 'bot_detection_403' in info metrics:
--user-agent to mimic a different browser or reduce rate with --rate-limit-rps.Database is locked error:
WHOIS data seems incomplete for some TLDs:
GeoIP shows "unknown" or is empty:
.mmdb files manually with --geoip.Compilation fails (for users building from source):
cargo update.Technology detection seems wrong:
How do I update the tool?
cargo install, run cargo install --force domain_status to get the latest version.You can also use domain_status as a Rust library in your own projects. Add it to your Cargo.toml:
[dependencies]
domain_status = "^0.1"
tokio = { version = "1", features = ["full"] }
Then use it in your code:
use domain_status::{Config, run_scan};
use std::path::PathBuf;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = Config {
file: PathBuf::from("urls.txt"),
max_concurrency: 50,
rate_limit_rps: 20,
enable_whois: false,
..Default::default()
};
let report = run_scan(config).await?;
println!("Processed {} URLs: {} succeeded, {} failed",
report.total_urls, report.successful, report.failed);
println!("Results saved in {}", report.db_path.display());
// Export to CSV using the library API
use domain_status::export::export_csv;
export_csv(&report.db_path, Some(&PathBuf::from("results.csv")), None, None, None, None)
.await?;
println!("Exported results to results.csv");
Ok(())
}
See the API documentation for details on Config options and usage.
Note: The library requires a Tokio runtime. Use #[tokio::main] in your application or ensure you're calling library functions within an async context.
Dependencies:
reqwest with rustls TLS backendhickory-resolver (async DNS with system config fallback)psl crate for accurate domain parsing (handles multi-part TLDs correctly)scraper (CSS selector-based extraction)tokio-rustls and x509-parser for certificate analysiswhois-service crate for domain registration lookupsmaxminddb for geographic and network informationsqlx with SQLite (WAL mode enabled)Security:
cargo-audit runs in CI to detect known vulnerabilities in dependencies (uses RustSec advisory database)gitleaks scans commits and code for accidentally committed secrets, API keys, tokens, and credentials-D warnings enforces strict linting rules and catches security issuesdomain_status follows a pipeline architecture:
Input File → URL Validation → Concurrent Processing → Data Extraction → Direct Database Writes → SQLite Database
Core Components:
Concurrency Model:
CancellationTokenPerformance Characteristics:
Preventing Credential Leaks:
Pre-commit hooks (recommended): Install pre-commit hooks to catch secrets before they're committed:
# Install pre-commit (if not already installed)
brew install pre-commit # macOS
# or: pip install pre-commit
# Install hooks
pre-commit install
This will automatically scan for secrets before every commit.
CI scanning: Gitleaks runs in CI to catch secrets in pull requests and scan git history.
GitHub Secret Scanning: GitHub automatically scans public repositories for known secret patterns (enabled by default).
Best practices:
.env files (already in .gitignore)See AGENTS.md for development guidelines and conventions.
See TESTING.md for detailed information about:
MIT License - see LICENSE file for details.