snm-brightdata-client

Crates.io	snm-brightdata-client
lib.rs	snm-brightdata-client
version	0.4.0
created_at	2025-07-20 16:40:18.299833+00
updated_at	2025-09-08 12:50:19.002358+00
description	Bright Data Wrapper Client Highly compacted Data implemented in Rust with Actix Web
homepage
repository	https://github.com/snmmaurya/snm-brightdata-client
max_upload_size
id	1761225
size	855,330

Sandeep Maurya (snmmaurya)

documentation

README

🌐 SNM BrightData Client

A powerful Rust crate providing MCP-compatible integration with BrightData's web scraping and data extraction services. Built with Actix Web for high-performance web scraping, search, data extraction, and screenshot capabilities.

✨ Features

🔍 Web Search: Search across Google, Bing, Yandex, and DuckDuckGo
🌐 Website Scraping: Extract content in markdown, raw HTML, or structured formats
📊 Data Extraction: Intelligent data extraction from any webpage
📸 Screenshots: Capture website screenshots using BrightData Browser
🤖 MCP Compatible: Full Model Context Protocol support for AI integrations
⚡ Multiple Interfaces: Library, CLI, and HTTP server
🔒 Authentication: Secure token-based authentication
📈 Rate Limiting: Built-in rate limiting and error handling
🚀 High Performance: Built with Actix Web for production workloads

🚀 Quick Start

Installation

Add to your Cargo.toml:

[dependencies]
snm-brightdata-client = "0.1.0"

Environment Setup

# BrightData Configuration
export BRIGHTDATA_API_TOKEN="your_api_token"
export BRIGHTDATA_BASE_URL="https://api.brightdata.com"
export WEB_UNLOCKER_ZONE="your_zone_name"
export BROWSER_ZONE="your_browser_zone"

# Proxy Credentials (optional)
export BRIGHTDATA_PROXY_USERNAME="your_username"
export BRIGHTDATA_PROXY_PASSWORD="your_password"

# Server Configuration
export MCP_AUTH_TOKEN="your_secure_token"
export PORT="8080"

📖 Usage

As a Library

use snm_brightdata_client::{BrightDataClient, BrightDataConfig};
use snm_brightdata_client::tool::{ToolResolver, Tool};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize client
    let config = BrightDataConfig::from_env()?;
    let client = BrightDataClient::new(config);

    // Use tools directly
    let resolver = ToolResolver::default();
    let search_tool = resolver.resolve("search_web").unwrap();
    
    let result = search_tool.execute(json!({
        "query": "Rust programming language",
        "engine": "google"
    })).await?;

    println!("Search results: {:#?}", result);
    Ok(())
}

CLI Usage

# Search the web
snm_cli search "Bitcoin price today" --engine google

# Scrape a website
snm_cli scrape https://example.com --format markdown

# Extract data
snm_cli extract https://example.com --format json

# Take screenshot
snm_cli screenshot https://example.com --width 1920 --height 1080

HTTP Server

# Start the server
cargo run --bin snm_server

# Health check
curl http://localhost:8080/health

# List available tools
curl http://localhost:8080/tools

# Use tools via API
curl -X POST http://localhost:8080/invoke \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "tool": "search_web",
    "parameters": {
      "query": "Rust web scraping",
      "engine": "google"
    }
  }'

🛠️ Available Tools

🔍 Search Web (`search_web`)

Search across multiple search engines with BrightData's unblocking capabilities.

{
  "tool": "search_web",
  "parameters": {
    "query": "your search query",
    "engine": "google"  // google, bing, yandex, duckduckgo
  }
}

🌐 Scrape Website (`scrape_website`)

Extract content from any website, bypassing anti-bot protections.

{
  "tool": "scrape_website",
  "parameters": {
    "url": "https://example.com",
    "format": "markdown"  // raw, markdown
  }
}

📊 Extract Data (`extract_data`)

Intelligent data extraction from webpages.

{
  "tool": "extract_data",
  "parameters": {
    "url": "https://example.com"
  }
}

📸 Take Screenshot (`take_screenshot`)

Capture high-quality screenshots of websites.

{
  "tool": "take_screenshot",
  "parameters": {
    "url": "https://example.com"
  }
}

🤖 MCP Integration

This crate is fully compatible with the Model Context Protocol (MCP), making it easy to integrate with AI systems like Claude.

MCP Server Configuration

{
  "type": "url",
  "url": "https://your-server.com/sse",
  "name": "brightdata-mcp",
  "authorization_token": "your_token",
  "tool_configuration": {
    "enabled": true,
    "allowed_tools": ["search_web", "scrape_website", "extract_data", "take_screenshot"]
  }
}

Example with Claude

curl https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: mcp-client-2025-04-04" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 2000,
    "messages": [
      {
        "role": "user",
        "content": "Search for the latest news about Rust programming language"
      }
    ],
    "mcp_servers": [
      {
        "type": "url",
        "url": "https://your-server.com/sse",
        "name": "brightdata-mcp",
        "authorization_token": "your_token"
      }
    ]
  }'

🏗️ API Reference

HTTP Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/tools`	GET	List available tools
`/invoke`	POST	Direct tool invocation
`/sse`	POST	Server-Sent Events streaming
`/mcp`	POST	MCP JSON-RPC protocol

Response Format

All tools return MCP-compatible responses:

{
  "content": [
    {
      "type": "text",
      "text": "Response content here"
    }
  ],
  "is_error": false,
  "raw_value": {
    // Original response data
  }
}

⚙️ Configuration

BrightData Setup

Sign up for BrightData account
Create zones for Web Unlocker and Browser
Get API credentials from your dashboard
Set environment variables as shown above

Zone Configuration

Web Unlocker Zone: For web scraping and search
Browser Zone: For screenshots and JavaScript rendering

🔧 Development

Building

# Build library
cargo build

# Build with all features
cargo build --all-features

# Run tests
cargo test

# Run with debug logging
RUST_LOG=debug cargo run --bin snm_server

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📊 Performance

Concurrent Requests: Supports high-concurrency workloads
Rate Limiting: Built-in 10 requests/minute per tool (configurable)
Timeout Handling: Configurable timeouts for different operations
Error Recovery: Automatic retry mechanisms with backoff

🛡️ Security

Token Authentication: Secure API access
Rate Limiting: Prevents abuse
Input Validation: Comprehensive parameter validation
CORS Support: Configurable cross-origin requests

📝 Examples

Check out the examples/ directory for:

Basic usage examples
Integration patterns
Advanced configurations
Error handling strategies

🤝 Integration Examples

With Anthropic Claude

Use as an MCP server to enhance Claude with web scraping capabilities.

With Custom Applications

Integrate into your Rust applications for:

E-commerce price monitoring
Content aggregation
Market research
Competitive analysis

📋 Requirements

Rust: 1.70 or later
BrightData Account: With API access
Network Access: HTTPS outbound connections

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

BrightData for providing robust web scraping infrastructure
Actix Web for high-performance HTTP server framework
Anthropic for MCP protocol specification

📞 Support

📧 Email: inxmaurya@gmail.com
🐛 Issues: GitHub Issues
📖 Documentation: API Docs

🚀 Roadmap

Additional search engines
Enhanced data extraction templates
WebSocket support for real-time scraping
GraphQL API interface
Kubernetes deployment examples
Advanced proxy rotation
Machine learning integration for content classification

Made with ❤️ by SNM Maurya

Commit count: 1