mcp-server-fetch

Crates.io	mcp-server-fetch
lib.rs	mcp-server-fetch
version	0.1.0
created_at	2025-09-22 21:38:48.25881+00
updated_at	2025-09-22 21:38:48.25881+00
description	A powerful Model Context Protocol (MCP) server for web content fetching and HTTP operations
homepage	https://github.com/sabry-awad97/rust-mcp-servers
repository	https://github.com/sabry-awad97/rust-mcp-servers
max_upload_size
id	1850672
size	106,721

Sabry Awad (sabry-awad97)

documentation

https://docs.rs/mcp-server-fetch

README

🌐 Fetch MCP Server

A powerful Model Context Protocol (MCP) server that provides secure web content fetching with robots.txt compliance, HTML-to-markdown conversion, content truncation, and comprehensive HTTP operations.

✨ Features

🌐 Web Content Fetching - Retrieve content from any HTTP/HTTPS URL
🤖 Robots.txt Compliance - Automatic robots.txt checking for autonomous fetching
📝 HTML to Markdown - Intelligent conversion of HTML content to clean markdown
✂️ Content Truncation - Configurable content length limits with continuation support
🔄 Raw HTML Mode - Option to retrieve unprocessed HTML content
🕵️ Custom User Agents - Configurable user agent strings for different use cases
🌐 Proxy Support - HTTP proxy configuration for network environments
🛡️ Security First - Safe URL validation and error handling
📊 Flexible Parameters - Configurable max length, start index, and content format
🎯 Dual Modes - Both tool and prompt interfaces for different use cases
🧹 Input Validation - Comprehensive parameter validation and sanitization
🎯 SOLID Architecture - Clean, maintainable, and testable codebase

🚀 Installation & Usage

Install from Crates.io

cargo install mcp-server-fetch

Run the Server

# Start the MCP server (communicates via stdio)
mcp-server-fetch

# Use custom user agent
mcp-server-fetch --user-agent "MyApp/1.0"

# Ignore robots.txt restrictions
mcp-server-fetch --ignore-robots-txt

# Use HTTP proxy
mcp-server-fetch --proxy-url "http://proxy.example.com:8080"

# Enable debug logging
LOG_LEVEL=debug mcp-server-fetch

Test with MCP Inspector

# Install and run the MCP Inspector to test the server
npx @modelcontextprotocol/inspector mcp-server-fetch

Use with Claude Desktop

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "fetch": {
      "command": "mcp-server-fetch",
      "args": ["--user-agent", "Claude-Desktop/1.0"]
    }
  }
}

🛠️ Available Tools

`fetch`

Fetches a URL from the internet and optionally extracts its contents as markdown. This tool provides internet access capabilities with intelligent content processing.

Parameters:

url (string): The URL to fetch
max_length (optional number): Maximum number of characters to return (default: 5000, max: 1,000,000)
start_index (optional number): Starting character index for content extraction (default: 0)
raw (optional boolean): Return raw HTML content without markdown conversion (default: false)

Example Request:

{
  "url": "https://example.com/article",
  "max_length": 10000,
  "start_index": 0,
  "raw": false
}

Example Response:

{
  "content": [
    {
      "type": "text",
      "text": "Contents of https://example.com/article:\n\n# Article Title\n\nThis is the converted markdown content..."
    }
  ]
}

Content Truncation:

When content exceeds the max_length, the response includes continuation instructions:

Content truncated. Call the fetch tool with a start_index of 5000 to get more content.

Robots.txt Compliance:

The server automatically checks robots.txt for autonomous fetching:

✅ Allowed URLs proceed normally
❌ Disallowed URLs return an error with robots.txt information
🔧 Use --ignore-robots-txt flag to bypass restrictions

📚 Available Prompts

`fetch`

Manual URL fetching prompt that retrieves and processes web content for immediate use in conversations.

Parameters:

url (string): The URL to fetch

Example Usage:

Use the fetch prompt with URL: https://news.example.com/latest

Response:

Returns a prompt message containing the fetched and processed content, ready for use in the conversation context.

🔧 Configuration

Command Line Options

mcp-server-fetch [OPTIONS]

Options:
      --user-agent <USER_AGENT>    Custom User-Agent string to use for requests
      --ignore-robots-txt          Ignore robots.txt restrictions
      --proxy-url <PROXY_URL>      Proxy URL to use for requests (e.g., http://proxy:8080)
  -h, --help                       Print help information
  -V, --version                    Print version information

Environment Variables

LOG_LEVEL: Set logging level (trace, debug, info, warn, error)

User Agent Modes

The server uses different user agents depending on the context:

Autonomous Mode: ModelContextProtocol/1.0 (Autonomous; +https://github.com/modelcontextprotocol/servers)
Manual Mode: ModelContextProtocol/1.0 (User-Specified; +https://github.com/modelcontextprotocol/servers)
Custom: Your specified user agent string

Security Model

The server implements several security measures:

URL Validation: All URLs are validated before fetching
Robots.txt Compliance: Automatic checking for autonomous operations
Content Limits: Configurable size limits prevent abuse
Error Sanitization: Safe error messages without sensitive information
Proxy Support: Secure proxy configuration for network environments

📖 Usage Examples

With Claude Desktop

Once configured, you can ask Claude:

"Fetch the latest news from https://news.example.com"

"Get the content from this documentation page: https://docs.example.com/api"

"Retrieve the raw HTML from https://example.com without markdown conversion"

"Fetch the first 2000 characters from this long article: https://blog.example.com/long-post"

With MCP Inspector

# Test the server interactively
npx @modelcontextprotocol/inspector mcp-server-fetch

# Try these operations:
# 1. Use fetch tool with different URLs
# 2. Test content truncation with max_length
# 3. Try raw HTML mode
# 4. Test robots.txt compliance
# 5. Use the fetch prompt for immediate content

Advanced Usage Examples

Fetching Large Content in Chunks:

{
  "url": "https://example.com/large-document",
  "max_length": 5000,
  "start_index": 0
}

Follow up with:

{
  "url": "https://example.com/large-document",
  "max_length": 5000,
  "start_index": 5000
}

Raw HTML Extraction:

{
  "url": "https://example.com/complex-page",
  "raw": true,
  "max_length": 10000
}

Custom Configuration:

# Production setup with custom user agent and proxy
mcp-server-fetch \
  --user-agent "MyCompany-AI/2.0 (+https://mycompany.com/bot)" \
  --proxy-url "http://corporate-proxy:8080"

🚨 Error Handling

The server provides detailed error messages for common issues:

Invalid URL: Clear feedback for malformed URLs
Network Errors: Helpful messages for connection issues
Robots.txt Violations: Specific guidance about autonomous fetching restrictions
Content Limits: Information about size restrictions and truncation
Validation Errors: Specific feedback on parameter validation failures
Proxy Errors: Clear messages for proxy configuration issues

Example Error Response:

{
  "error": {
    "code": -32602,
    "message": "Robots.txt disallows autonomous fetching of this URL",
    "data": {
      "url": "https://example.com/restricted",
      "robots_txt_url": "https://example.com/robots.txt",
      "user_agent": "ModelContextProtocol/1.0 (Autonomous)"
    }
  }
}

🧪 Testing

Run the comprehensive test suite:

# Run all tests
cargo test

# Run specific test categories
cargo test fetch_service
cargo test validation
cargo test server

# Run with coverage
cargo tarpaulin --out html

Integration Testing

# Test with real URLs (requires internet)
cargo test --features integration-tests

# Test robots.txt compliance
cargo test robots_txt_tests

# Test content processing
cargo test content_processing

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

Clone the repository
Install Rust (1.70+ required)
Run cargo build
Run cargo test

Code Style

This project follows SOLID principles and Domain-Driven Design:

Clean Architecture: Separation of concerns with clear layers
Dependency Injection: Testable and maintainable code
Comprehensive Testing: Unit and integration tests
Error Handling: Robust error types and handling

Run cargo fmt and cargo clippy before submitting.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with rmcp - Rust MCP implementation
HTTP client powered by reqwest
HTML to Markdown conversion via fast_html2md
URL parsing using url
Async runtime provided by tokio

📞 Support

Made with ❤️ for secure web content fetching in the MCP ecosystem

Commit count: 51