| Crates.io | fetch-mcp-rs |
| lib.rs | fetch-mcp-rs |
| version | 0.1.1 |
| created_at | 2025-11-29 08:12:36.103797+00 |
| updated_at | 2025-11-29 08:12:36.103797+00 |
| description | Advanced Rust MCP server for web content fetching with 11+ tools |
| homepage | https://github.com/ssoj13/fetch-mcp-rs |
| repository | https://github.com/ssoj13/fetch-mcp-rs |
| max_upload_size | |
| id | 1956466 |
| size | 302,460 |
Advanced Rust MCP server for web content fetching with 13+ specialized tools. Convert HTML to Markdown, extract metadata, parse feeds, search Reddit/Wikipedia, and more.
git clone https://github.com/ssoj13/fetch-mcp-rs
cd fetch-mcp-rs
cargo build --release
The compiled binary will be in target/release/fetch-mcp-rs or target/release/fetch-mcp-rs.exe (Windows).
fetch-mcp-rs [OPTIONS]
Options:
--user-agent <USER_AGENT> User agent string for HTTP requests
--ignore-robots-txt Ignore robots.txt restrictions (use with caution)
--proxy-url <PROXY_URL> HTTP proxy URL (e.g., http://proxy:8080)
--log-file <LOG_FILE> Log file path (optional, for debugging)
--port <PORT> Enable HTTP stream mode on specified port
-h, --help Print help
Add to your MCP settings:
{
"mcpServers": {
"fetch": {
"command": "/path/to/fetch-mcp-rs",
"args": []
}
}
}
With custom options:
{
"mcpServers": {
"fetch": {
"command": "/path/to/fetch-mcp-rs",
"args": [
"--user-agent", "MyBot/1.0",
"--proxy-url", "http://proxy:8080",
"--log-file", "/tmp/fetch.log"
]
}
}
}
Fetch URL content and convert HTML to Markdown using Readability algorithm.
Parameters:
url (string, required) - URL to fetchraw (boolean, optional) - Return raw HTML instead of MarkdownExample:
{
"url": "https://example.com/article",
"raw": false
}
Output:
{
"content": "# Article Title\n\nContent here...",
"url": "https://example.com/article"
}
Extract Open Graph, Schema.org, and HTML metadata from a URL.
Parameters:
url (string, required) - URL to fetch metadata fromExample:
{
"url": "https://example.com"
}
Output:
{
"title": "Example Domain",
"description": "Example description",
"og_image": "https://example.com/image.jpg",
"og_title": "Example Title",
"author": "John Doe",
"published_date": "2024-01-01",
"language": "en",
"keywords": ["example", "demo"],
"twitter_card": "summary_large_image"
}
Parse RSS/Atom feeds and extract entries.
Parameters:
url (string, required) - Feed URLmax_entries (number, optional) - Maximum entries to return (default: 10)Example:
{
"url": "https://example.com/feed.xml",
"max_entries": 5
}
Output:
{
"title": "Blog Feed",
"description": "Latest posts",
"link": "https://example.com",
"entries": [
{
"title": "Post Title",
"link": "https://example.com/post",
"published": "2024-01-01T12:00:00Z",
"summary": "Post summary...",
"author": "Author Name"
}
]
}
Extract specific HTML elements using CSS selectors.
Parameters:
url (string, required) - URL to fetchselector (string, required) - CSS selector (e.g., "div.content", "a[href]")attribute (string, optional) - Extract specific attribute instead of textExample:
{
"url": "https://example.com",
"selector": "a.link",
"attribute": "href"
}
Output:
[
{
"text": "Link text",
"html": "<a class=\"link\" href=\"/page\">Link text</a>",
"attributes": {
"href": "/page",
"class": "link"
}
}
]
Extract HTML tables to structured JSON.
Parameters:
url (string, required) - URL to fetchtable_index (number, optional) - Extract specific table by index (0-based)Example:
{
"url": "https://example.com/data",
"table_index": 0
}
Output:
[
{
"headers": ["Name", "Age", "City"],
"rows": [
["John", "30", "NYC"],
["Jane", "25", "LA"]
]
}
]
Parse sitemap.xml and extract URLs.
Parameters:
url (string, required) - Sitemap URLExample:
{
"url": "https://example.com/sitemap.xml"
}
Output:
{
"urls": [
{
"loc": "https://example.com/page1",
"lastmod": "2024-01-01",
"changefreq": "weekly",
"priority": 0.8
}
],
"sitemaps": [
{
"loc": "https://example.com/sitemap2.xml",
"lastmod": "2024-01-01"
}
]
}
Extract all links from a page with filtering options.
Parameters:
url (string, required) - URL to fetchinternal_only (boolean, optional) - Only internal links (same domain)external_only (boolean, optional) - Only external links (different domain)Example:
{
"url": "https://example.com",
"internal_only": true
}
Output:
{
"base_url": "https://example.com",
"links": [
{
"href": "https://example.com/page",
"text": "Page Title",
"title": "Link title",
"rel": "nofollow",
"is_internal": true
}
]
}
Fetch multiple URLs in parallel with rate limiting.
Parameters:
urls (array of strings, required) - URLs to fetchmax_concurrent (number, optional) - Max concurrent requests (default: 5)timeout (number, optional) - Timeout per request in seconds (default: 30)Example:
{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"max_concurrent": 3,
"timeout": 10
}
Output:
[
{
"url": "https://example.com/page1",
"status": 200,
"success": true,
"content_length": 1024,
"error": null
}
]
Search for text within a page with context extraction.
Parameters:
url (string, required) - URL to search inquery (string, required) - Search querycontext_chars (number, optional) - Characters of context around match (default: 100)max_results (number, optional) - Maximum results to return (default: 10)case_sensitive (boolean, optional) - Case-sensitive search (default: false)Example:
{
"url": "https://example.com",
"query": "search term",
"context_chars": 50,
"max_results": 5
}
Output:
{
"query": "search term",
"total_matches": 3,
"results": [
{
"match": "search term",
"context": "...text before search term text after...",
"position": 1234
}
]
}
Search Reddit posts with advanced filtering.
Parameters:
query (string, optional) - Search query (omit for subreddit browsing)subreddit (string, optional) - Specific subreddit (e.g., "rust")sort (string, optional) - Sort by: "hot", "new", "top", "rising" (default: "hot")time (string, optional) - Time filter for "top": "hour", "day", "week", "month", "year", "all"limit (number, optional) - Number of posts (default: 10, max: 100)include_comments (boolean, optional) - Fetch top comments (default: false)comment_limit (number, optional) - Max comments per post (default: 5)Example:
{
"query": "rust programming",
"subreddit": "rust",
"sort": "top",
"time": "week",
"limit": 5,
"include_comments": true
}
Output:
[
{
"title": "Post Title",
"author": "username",
"subreddit": "rust",
"score": 123,
"url": "https://example.com",
"permalink": "https://reddit.com/r/rust/comments/...",
"selftext": "Post content...",
"created_utc": 1234567890,
"num_comments": 45,
"comments": [
{
"author": "commenter",
"body": "Comment text...",
"score": 10
}
]
}
]
Search and fetch Wikipedia articles.
Parameters:
action (string, required) - Action: "search", "summary", "full", "random"query (string, optional) - Search query (required for "search" and "summary")limit (number, optional) - Search results limit (default: 10)language (string, optional) - Wikipedia language code (default: "en")Examples:
Search:
{
"action": "search",
"query": "Rust programming",
"limit": 5,
"language": "en"
}
Summary:
{
"action": "summary",
"query": "Rust (programming language)"
}
Full Article:
{
"action": "full",
"query": "Rust (programming language)"
}
Random Article:
{
"action": "random",
"language": "en"
}
Output (summary/full):
{
"title": "Rust (programming language)",
"extract": "Rust is a multi-paradigm...",
"url": "https://en.wikipedia.org/wiki/Rust_(programming_language)",
"content": "Full article content..." // only in "full" action
}
Extract text from PDF files.
Parameters:
url (string, required) - PDF URLmax_pages (number, optional) - Maximum pages to extract (default: all)Requires: pdf feature enabled (default)
Get image metadata without full download.
Parameters:
url (string, required) - Image URLRequires: images feature enabled (default)
default = ["pdf", "images"]
# No PDF support
cargo build --no-default-features
# Only PDF, no images
cargo build --no-default-features --features pdf
# Full features
cargo build --features full
cargo test
cargo build --release
RUST_LOG=debug cargo run
Core:
rmcp 0.9.1 - MCP SDK with new APIreqwest 0.12 - HTTP clienttokio 1.48 - Async runtimeHTML/Content:
readability 0.3 - Content extractionscraper 0.24 - HTML parsinghtml2text 0.16 - HTML to text conversionFeeds & Data:
feed-rs 2.3 - Feed parsingwebpage 2.0 - Metadata extractionquick-xml 0.38 - XML parsingOptional:
lopdf 0.38 - PDF text extractionimage 0.25 - Image processingPerformance:
cached 0.56 - In-memory cachinggovernor 0.10 - Rate limitingMIT License - see LICENSE file for details.
Contributions welcome! Please:
cargo test passes