| Crates.io | url-preview |
| lib.rs | url-preview |
| version | 0.6.0 |
| created_at | 2025-02-13 14:31:13.675434+00 |
| updated_at | 2025-07-23 11:10:37.028148+00 |
| description | High-performance URL preview generator for messaging and social media applications |
| homepage | |
| repository | https://github.com/ZhangHanDong/url-preview |
| max_upload_size | |
| id | 1554271 |
| size | 727,672 |
A high-performance Rust library for generating rich URL previews with specialized support for Twitter/X and GitHub. This library offers efficient caching, concurrent processing, comprehensive metadata extraction, detailed error reporting, robust security features, browser-based rendering, and LLM-powered data extraction.
Add this to your Cargo.toml:
[dependencies]
url-preview = "0.6.0"
# Optional features
url-preview = { version = "0.6.0", features = ["full"] }
# Or select specific features
url-preview = { version = "0.6.0", features = ["cache", "logging", "github", "twitter", "browser", "llm"] }
default: Basic functionality with default reqwest featurescache: Enable caching support with DashMaplogging: Enable structured logging with tracinggithub: Enable GitHub-specific preview enhancementstwitter: Enable Twitter/X oEmbed integrationbrowser: Enable browser-based rendering with playwright-mcp (requires Node.js)llm: Enable LLM-based data extractionfull: Enable all featuresHere's a simple example to get started:
use url_preview::{PreviewService, Preview, PreviewError};
#[tokio::main]
async fn main() -> Result<(), PreviewError> {
// Create a preview service with default settings
let preview_service = PreviewService::new();
// Generate a preview
let preview = preview_service
.generate_preview("https://www.rust-lang.org")
.await?;
println!("Title: {:?}", preview.title);
println!("Description: {:?}", preview.description);
println!("Image: {:?}", preview.image_url);
Ok(())
}
The library now provides detailed error types for better error handling:
use url_preview::{PreviewService, PreviewError};
#[tokio::main]
async fn main() {
let service = PreviewService::new();
match service.generate_preview("https://example.com/404").await {
Ok(preview) => println!("Got preview: {:?}", preview.title),
Err(PreviewError::NotFound(msg)) => {
println!("Resource not found: {}", msg);
}
Err(PreviewError::DnsError(msg)) => {
println!("DNS resolution failed: {}", msg);
}
Err(PreviewError::TimeoutError(msg)) => {
println!("Request timed out: {}", msg);
}
Err(PreviewError::ServerError { status, message }) => {
println!("Server error ({}): {}", status, message);
}
Err(PreviewError::ClientError { status, message }) => {
println!("Client error ({}): {}", status, message);
}
Err(e) => println!("Other error: {}", e),
}
}
Process multiple URLs efficiently:
use url_preview::PreviewService;
use futures::future::join_all;
#[tokio::main]
async fn main() {
let service = PreviewService::new();
let urls = vec![
"https://www.rust-lang.org",
"https://github.com/rust-lang/rust",
"https://news.ycombinator.com",
];
// Concurrent processing with proper error handling
let results = join_all(
urls.iter().map(|url| service.generate_preview(url))
).await;
for (url, result) in urls.iter().zip(results.iter()) {
match result {
Ok(preview) => println!("{}: {:?}", url, preview.title),
Err(e) => println!("{}: Error - {}", url, e),
}
}
}
Configure the service with specific requirements:
use url_preview::{PreviewService, PreviewServiceConfig, Fetcher, FetcherConfig};
use std::time::Duration;
let config = PreviewServiceConfig::new(1000) // Cache capacity
.with_max_concurrent_requests(20)
.with_default_fetcher(
Fetcher::new_with_config(FetcherConfig {
timeout: Duration::from_secs(30),
user_agent: "my-app/1.0".into(),
..Default::default()
})
);
let service = PreviewService::new_with_config(config);
The library includes comprehensive security features that are enabled by default:
use url_preview::{
PreviewService, PreviewServiceConfig, Fetcher, FetcherConfig,
UrlValidationConfig, ContentLimits, CacheStrategy
};
use std::time::Duration;
// Configure URL validation
let mut url_validation = UrlValidationConfig::default();
url_validation.block_private_ips = true; // Block private IPs (default)
url_validation.block_localhost = true; // Block localhost (default)
// Add domain whitelist (only these domains will be allowed)
url_validation.allowed_domains.insert("trusted-site.com".to_string());
url_validation.allowed_domains.insert("api.example.com".to_string());
// Or use blacklist (these domains will be blocked)
// url_validation.blocked_domains.insert("malicious.com".to_string());
// Configure content limits
let mut content_limits = ContentLimits::default();
content_limits.max_content_size = 5 * 1024 * 1024; // 5MB
content_limits.max_download_time = 20; // 20 seconds
content_limits.allowed_content_types.insert("text/html".to_string());
content_limits.allowed_content_types.insert("application/json".to_string());
// Create fetcher with security config
let fetcher_config = FetcherConfig {
url_validation,
content_limits,
timeout: Duration::from_secs(15),
user_agent: "my-secure-app/1.0".to_string(),
..Default::default()
};
let service = PreviewService::new_with_config(
PreviewServiceConfig::new(1000)
.with_cache_strategy(CacheStrategy::UseCache)
.with_default_fetcher(Fetcher::with_config(fetcher_config))
);
// Usage with security errors
match service.generate_preview("http://localhost:8080").await {
Err(PreviewError::LocalhostBlocked) => {
println!("Localhost access blocked for security");
}
Err(PreviewError::PrivateIpBlocked(ip)) => {
println!("Private IP {} blocked", ip);
}
Err(PreviewError::DomainBlocked(domain)) => {
println!("Domain {} is blacklisted", domain);
}
_ => {}
}
#[cfg(feature = "twitter")]
{
let preview = service
.generate_preview("https://x.com/username/status/123456789")
.await?;
// Twitter previews include embedded content
println!("Tweet content: {:?}", preview.description);
}
#[cfg(feature = "github")]
{
// Basic preview
let preview = service
.generate_preview("https://github.com/rust-lang/rust")
.await?;
// Detailed repository information
if let Ok(details) = service
.get_github_detailed_info("https://github.com/rust-lang/rust")
.await
{
println!("Stars: {}", details.stars_count);
println!("Forks: {}", details.forks_count);
println!("Language: {}", details.language);
println!("Open Issues: {}", details.open_issues);
}
}
Control caching behavior for different use cases:
use url_preview::{PreviewService, CacheStrategy};
// Service with caching enabled (default)
let cached_service = PreviewService::new();
// Service with caching disabled
let no_cache_service = PreviewService::no_cache();
// Manual cache operations if needed
#[cfg(feature = "cache")]
{
let preview = cached_service.generate_preview(url).await?;
// Check if URL is in cache
let cached = cached_service.default_generator.cache.get(url).await;
}
Configure comprehensive logging:
#[cfg(feature = "logging")]
{
use url_preview::{setup_logging, LogConfig};
use std::path::PathBuf;
let log_config = LogConfig {
log_dir: PathBuf::from("logs"),
log_level: "info".into(),
console_output: true,
file_output: true,
};
setup_logging(log_config);
}
Control the number of concurrent requests to prevent resource exhaustion:
use url_preview::{PreviewServiceConfig, MAX_CONCURRENT_REQUESTS};
let config = PreviewServiceConfig::new(1000)
.with_max_concurrent_requests(50); // Default is MAX_CONCURRENT_REQUESTS (500)
let service = PreviewService::new_with_config(config);
The library automatically retries on server errors and timeouts:
// The fetcher will automatically retry up to 3 times for:
// - Server errors (5xx)
// - Timeouts
// - Connection errors
// But NOT for client errors (4xx) or DNS failures
Generate previews for JavaScript-heavy sites and SPAs:
#[cfg(feature = "browser")]
{
use url_preview::{BrowserUsagePolicy, McpConfig, PreviewServiceConfig};
// Configure browser integration
let mcp_config = McpConfig {
enabled: true,
server_command: vec![
"npx".to_string(),
"-y".to_string(),
"@modelcontextprotocol/server-playwright".to_string(),
],
..Default::default()
};
let config = PreviewServiceConfig::new(1000)
.with_mcp_config(mcp_config)
.with_browser_usage_policy(BrowserUsagePolicy::Auto);
let service = PreviewService::new_with_config(config);
// Browser will automatically be used for SPAs
let preview = service.generate_preview("https://twitter.com/rustlang").await?;
}
Extract structured data from web pages using Large Language Models:
#[cfg(feature = "llm")]
{
use url_preview::{LLMExtractor, LLMConfig, Fetcher};
use serde::{Deserialize, Serialize};
use schemars::JsonSchema;
#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct ArticleInfo {
title: String,
summary: String,
author: Option<String>,
published_date: Option<String>,
main_topics: Vec<String>,
reading_time_minutes: Option<u32>,
}
// Auto-detect the best available provider
let provider = LLMConfig::auto_detect_provider().await?;
let extractor = LLMExtractor::new(provider);
let fetcher = Fetcher::new();
// Extract structured article data
let result = extractor.extract::<ArticleInfo>(
"https://blog.rust-lang.org/2024/01/01/some-article.html",
&fetcher
).await?;
println!("Title: {}", result.data.title);
println!("Topics: {:?}", result.data.main_topics);
if let Some(usage) = result.usage {
println!("Tokens used: {}", usage.total_tokens);
}
}
use url_preview::{LLMConfig, OpenAIProvider, AnthropicProvider, MockProvider};
// OpenAI (requires OPENAI_API_KEY)
let provider = LLMConfig::openai_from_env()?;
// Anthropic (requires ANTHROPIC_API_KEY)
let provider = LLMConfig::anthropic_from_env()?;
// claude-code-api (no API key needed)
let provider = LLMConfig::claude_code_from_env()?;
// Local models via Ollama (requires LOCAL_LLM_ENDPOINT)
let provider = LLMConfig::local_from_env()?;
// Mock provider for testing
let provider = MockProvider::new();
use url_preview::{LLMExtractorConfig, ContentFormat};
let config = LLMExtractorConfig {
format: ContentFormat::Text, // HTML, Markdown, or Text
clean_html: true, // Remove scripts, styles, nav elements
max_content_length: 10_000, // Limit content size
model_params: Default::default(),
};
let extractor = LLMExtractor::with_config(provider, config);
Check the examples/ directory for comprehensive examples:
Basic Usage:
url_preview.rs - Basic usage and caching demonstrationsecurity_validation.rs - Security features demonstrationcontent_limits.rs - Content restriction examplessecure_preview_cli.rs - CLI with full security optionsPlatform-specific:
github_preview.rs - GitHub-specific featurestwitter_preview.rs - Twitter/X integrationAdvanced Features:
browser_preview.rs - Browser-based rendering for SPAsbrowser_llm_extraction.rs - Combined browser + LLM extractionLLM-powered Extraction:
comprehensive_llm_test.rs - Complete LLM functionality demonstrationtest_anthropic_direct.rs - Direct Anthropic API usagetest_openai_extraction.rs - OpenAI API integrationclaude_api_working.rs - claude-code-api integrationllm_extraction.rs - Basic LLM data extractionPerformance:
batch_concurrent.rs - Batch processing examplespreview_cli.rs - Interactive CLI exampleRun examples with:
cargo run --example url_preview
cargo run --example security_validation
cargo run --example content_limits
cargo run --example secure_preview_cli -- https://example.com
cargo run --example github_preview --features github
cargo run --example twitter_preview --features twitter
The library includes comprehensive benchmarks:
cargo bench
Run the test suite:
cargo test
cargo test --all-features # Test with all features enabled
For detailed documentation, see:
Contributions are welcome! Please feel free to submit issues and pull requests.
cargo buildcargo testcargo benchcargo fmtcargo clippy -- -D warningsThis project is licensed under the MIT License - see the LICENSE file for details.
Google provides the Rich Results Analysis Tool to help you validate your website's tags.
Use this tool to ensure your website follows the conventions for optimal preview generation.