| Crates.io | rexturl |
| lib.rs | rexturl |
| version | 0.4.1 |
| created_at | 2022-09-16 13:09:55.671937+00 |
| updated_at | 2025-08-21 05:47:23.104121+00 |
| description | A simple tool to split urls in their protocol, host, port, path and query parts. |
| homepage | https://github.com/vschwaberow/rexturl.git |
| repository | https://github.com/vschwaberow/rexturl.git |
| max_upload_size | |
| id | 667341 |
| size | 366,557 |
A command-line tool for parsing and manipulating URLs with predictable output formats.
--format {plain,tsv,csv,json,jsonl,custom,sql}--fields domain,path,url--template '{scheme}://{domain}{path}'cargo install rexturl
or clone the repository and build from source:
git clone https://github.com/vschwaberow/rexturl.git
cd rexturl
cargo build --release
Extract domain from URL:
rexturl --urls "https://www.example.com/path" --fields domain
# Output: example.com
TSV format with headers:
echo "https://blog.example.co.uk/posts" | rexturl --fields subdomain,domain,path --format tsv --header
# Output:
# subdomain domain path
# blog example.co.uk /posts
JSON output for APIs:
curl -s api.com/urls | rexturl --fields domain --format json
# Output: {"urls":[{"domain":"api.com"}]}
rexturl [OPTIONS]
--urls <URLS> - Specify URLs as command-line arguments| Option | Values | Description |
|---|---|---|
--format |
plain, tsv, csv, json, jsonl, custom, sql |
Output format (default: plain) |
--fields |
domain,path,url |
Comma-separated fields to extract |
--urls |
URL strings | Input URLs to process |
--header |
- | Include header row for tabular formats |
--sort |
- | Sort output by first field |
--unique |
- | Remove duplicate entries |
| Field | Description | Example |
|---|---|---|
url |
Original URL string | https://www.example.com/path |
scheme |
Protocol | https |
username |
Username portion | user |
host/hostname |
Full hostname | www.example.com |
subdomain |
Subdomain only | www |
domain |
Registrable domain | example.com |
port |
Port number | 8080 |
path |
URL path | /path |
query |
Query parameters | q=search |
fragment |
Fragment identifier | section |
| Option | Values | Description |
|---|---|---|
--pretty |
- | Pretty-print JSON output |
--strict |
- | Exit code 2 if any URL fails to parse |
--no-newline |
- | Suppress trailing newline |
--null-empty |
Custom string | Value for missing fields (default: \N) |
--color |
auto, never, always |
Colored output for plain format |
| Option | Values | Description |
|---|---|---|
--template |
Template string | Custom format template (e.g., '{scheme}://{domain}{path}') |
--escape |
none, shell, csv, json, sql |
Escaping mode for custom format |
| Option | Values | Description |
|---|---|---|
--sql-table |
Table name | SQL table name (default: urls) |
--sql-create-table |
- | Include CREATE TABLE statement |
--sql-dialect |
postgres, mysql, sqlite, generic |
SQL dialect for type mapping |
These flags automatically add fields - use --fields for explicit control:
| Flag | Equivalent | Description |
|---|---|---|
--domain |
--fields domain |
Extract domain |
--host |
--fields subdomain |
Extract subdomain |
--scheme |
--fields scheme |
Extract scheme |
--path |
--fields path |
Extract path |
| Option | Use Instead | Description |
|---|---|---|
--json |
--format json |
JSON output (deprecated) |
--all |
--fields with specific names |
All fields (deprecated) |
--custom |
--format and --fields |
Custom format (deprecated) |
1. Extract domains for analysis:
cat urls.txt | rexturl --fields domain --sort --unique
# Clean list of unique domains
2. Create a spreadsheet-ready CSV:
rexturl --urls "https://api.example.com/v1/users" --fields subdomain,domain,path --format csv --header
# subdomain,domain,path
# api,example.com,/v1/users
3. JSON for APIs and scripts:
curl -s api.com/endpoints | rexturl --fields domain,path --format json
# {"urls":[{"domain":"api.com","path":"/endpoints"}]}
Plain (default):
rexturl --urls "https://blog.example.com/posts" --fields subdomain,domain,path
# blog example.com /posts
TSV with header:
echo "https://api.example.com/v1" | rexturl --fields subdomain,domain,path --format tsv --header
# subdomain domain path
# api example.com /v1
CSV for spreadsheets:
rexturl --fields url,domain --format csv --header < urls.txt
# url,domain
# https://www.example.com,example.com
JSON for APIs:
echo "https://api.example.com" | rexturl --fields domain,path --format json --pretty
# {
# "urls": [
# {
# "domain": "example.com",
# "path": "/"
# }
# ]
# }
JSONL for streaming:
cat large-urls.txt | rexturl --fields domain --format jsonl | head -3
# {"domain":"example.com"}
# {"domain":"api.com"}
# {"domain":"blog.net"}
Custom format with templates:
rexturl --urls "https://api.example.com/v1/users" --format custom --template "{scheme}://{domain}{path}"
# https://example.com/v1/users
SQL INSERT statements:
rexturl --urls "https://www.example.com/path" --format sql --fields domain,path
# INSERT INTO urls (domain, path) VALUES ('example.com', '/path');
Multi-part TLD handling:
rexturl --urls "https://blog.example.co.uk/posts" --fields subdomain,domain,path --format tsv
# blog example.co.uk /posts
Handle missing values:
echo "https://example.com" | rexturl --fields domain,port --format tsv --null-empty "N/A"
# example.com N/A
Error handling with strict mode:
rexturl --urls "not-a-url" --strict --fields domain
# Error: Failed to parse URL: not-a-url
# Exit code: 2
Legacy syntax (still works):
rexturl --urls "https://www.example.com" --domain --path
# example.com /
rexturl includes intelligent handling for domains and subdomains:
co.uk, org.uk, com.au, etc.--domain flag extracts the registrable domain name--host alone, it extracts the subdomain portionSupported multi-part TLDs include:
co.uk, org.uk, ac.uk, gov.uk, me.uk, net.uk, sch.uk, com.au, net.au, org.au, edu.au, gov.au, co.nz, net.nz, org.nz, govt.nz, co.za, org.za, com.br, net.br, org.br, co.jp, com.mx, com.ar, com.sg, com.my, co.id, com.hk, co.th, in.th
Examples:
# Using custom format for specific extraction
echo "https://blog.example.co.uk/posts" | rexturl --format custom --template "Subdomain: {subdomain}, Domain: {domain}"
# Output: Subdomain: blog, Domain: example.co.uk
# Extract all components (tab-separated format)
rexturl --urls "https://user@blog.example.co.uk:8080/posts?q=test#frag" --fields scheme,username,hostname,port,path,query,fragment,domain --format tsv
# Output: https user blog.example.co.uk 8080 /posts q=test frag example.co.uk
# Extract components with URLs flag
rexturl --urls "https://blog.example.co.uk/posts" --fields domain
# Output: example.co.uk
Use --format custom --template for flexible output formatting:
Basic fields:
{field} - Insert field value or empty string if missing{field:default} - Insert field value or default if missing{field?text} - Insert text only if field has a value{field!text} - Insert text only if field is missingAvailable fields:
{scheme} - URL scheme (http, https, etc.){username} - Username portion of the URL{host} - Full hostname{hostname} - Alias for host{subdomain} - Subdomain portion (e.g., "www" in www.example.com){domain} - Domain name (e.g., "example.com"){port} - Port number{path} - URL path{query} - Query string (without the leading ?){fragment} - Fragment identifier (without the leading #)Escaping modes:
--escape none - No escaping (default)--escape shell - Shell-safe quoting--escape csv - CSV-compatible escaping--escape json - JSON string escaping--escape sql - SQL value escaping# Basic template
rexturl --urls "https://example.com/api" --format custom --template "Host: {host}, Path: {path}"
# Output: Host: example.com, Path: /api
# With defaults
rexturl --urls "https://example.com" --format custom --template "{scheme}://{domain}:{port:80}"
# Output: https://example.com:80
# Conditional text
rexturl --urls "https://example.com/path?q=test" --format custom --template "{domain}{query?&found}"
# Output: example.com&found
# Shell escaping
rexturl --urls "https://example.com/path with spaces" --format custom --template "{url}" --escape shell
# Output: 'https://example.com/path with spaces'
Generate SQL INSERT statements from URL data:
# Basic SQL output
rexturl --urls "https://www.example.com/path" --format sql --fields domain,path
# INSERT INTO urls (domain, path) VALUES ('example.com', '/path');
# With CREATE TABLE
rexturl --urls "https://example.com" --format sql --fields domain --sql-create-table
# CREATE TABLE IF NOT EXISTS urls (
# id SERIAL PRIMARY KEY,
# domain VARCHAR(253),
# created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
# );
# INSERT INTO urls (domain) VALUES ('example.com');
# Custom table and dialect
rexturl --urls "https://example.com:3306" --format sql --fields domain,port --sql-table my_urls --sql-dialect mysql
# INSERT INTO my_urls (domain, port) VALUES ('example.com', '3306');
UrlRecord struct for all formatscargo bench
# fast_url_parsing time: [823.79 ns 827.53 ns 831.87 ns]
# fast_url_component_access time: [69.100 ns 69.309 ns 69.527 ns]
example.co.ukFor a detailed list of changes and version history, see CHANGELOG.md.
Contributions are welcome! Please feel free to submit a Pull Request.
git checkout -b feature/amazing-feature)cargo test)cargo fmt && cargo clippy)git commit -m 'Add some amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License - see the LICENSE file for details.