grepapp_haystack

Crates.iogrepapp_haystack
lib.rsgrepapp_haystack
version1.5.1
created_at2026-01-20 10:53:31.217431+00
updated_at2026-01-20 10:53:31.217431+00
descriptionGrep.app integration for searching code across GitHub repositories
homepage
repositoryhttps://github.com/terraphim/terraphim-ai
max_upload_size
id2056325
size80,536
Dr Alexander Mikhalev (AlexMikhalev)

documentation

README

Grep.app Haystack

Grep.app integration for Terraphim AI, enabling code search across millions of GitHub repositories.

Overview

This crate provides a haystack provider for grep.app, a code search engine by Vercel that indexes millions of public GitHub repositories. It allows you to search for code patterns, functions, and implementations across a massive codebase directly from Terraphim AI.

Features

  • Fast Code Search: Search across 500,000+ GitHub repositories
  • Language Filtering: Filter results by programming language (Rust, Python, JavaScript, etc.)
  • Repository Filtering: Narrow searches to specific repositories (e.g., "tokio-rs/tokio")
  • Path Filtering: Search within specific directories
  • Rate Limiting: Automatic handling of API rate limits
  • Error Handling: Graceful degradation on failures

Installation

Add to your Cargo.toml:

[dependencies]
grepapp_haystack = { path = "../haystack_grepapp" }
haystack_core = { path = "../haystack_core" }
terraphim_types = { path = "../terraphim_types" }

Usage

Basic Search

use grepapp_haystack::GrepAppHaystack;
use haystack_core::HaystackProvider;
use terraphim_types::SearchQuery;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let haystack = GrepAppHaystack::new()?;

    let query = SearchQuery {
        search_term: "async fn tokio::spawn".into(),
        ..Default::default()
    };

    let documents = haystack.search(&query).await?;

    for doc in documents {
        println!("{} - {}", doc.title, doc.url);
    }

    Ok(())
}

Search with Filters

use grepapp_haystack::GrepAppHaystack;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create haystack with default filters
    let haystack = GrepAppHaystack::with_filters(
        Some("Rust".to_string()),           // Language filter
        Some("tokio-rs/tokio".to_string()), // Repository filter
        Some("tokio/src/".to_string()),     // Path filter
    )?;

    let query = SearchQuery {
        search_term: "JoinHandle".into(),
        ..Default::default()
    };

    let documents = haystack.search(&query).await?;

    Ok(())
}

Using the Low-Level Client

use grepapp_haystack::{GrepAppClient, SearchParams};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = GrepAppClient::new()?;

    let params = SearchParams {
        query: "tokio spawn".to_string(),
        language: Some("Rust".to_string()),
        repo: None,
        path: None,
    };

    let hits = client.search(&params).await?;

    for hit in hits {
        println!("Repo: {}", hit.source.repo.raw);
        println!("File: {}", hit.source.path.raw);
        println!("Branch: {}", hit.source.branch.raw);
        println!("Snippet: {}", hit.source.content.snippet);
    }

    Ok(())
}

Configuration

Terraphim Role Configuration

Add grep.app as a haystack in your role configuration:

{
  "name": "Code Search Engineer",
  "relevance_function": "BM25",
  "theme": "default",
  "haystacks": [
    {
      "name": "GitHub Code Search",
      "service": "GrepApp",
      "extra_parameters": {
        "language": "Rust",
        "repo": "",
        "path": ""
      }
    }
  ]
}

API Parameters

  • query (required): Search query string (max 1000 characters)
  • language (optional): Programming language filter (e.g., "Rust", "Python", "JavaScript")
  • repo (optional): Repository filter in "owner/repo" format (e.g., "tokio-rs/tokio")
  • path (optional): Path filter for directory-specific searches (e.g., "src/")

Response Format

Each search result is converted to a Document with:

  • id: Unique identifier (format: repo:branch:path)
  • url: GitHub blob URL to the file
  • title: Formatted as "repo - filename"
  • body: Code snippet with matches (HTML tags stripped)
  • description: Human-readable description
  • tags: Repository name and filename

Error Handling

The client handles various error conditions:

  • Rate Limiting (429): Returns error with message "Rate limit exceeded"
  • No Results (404): Returns empty vector instead of error
  • Network Errors: Propagates with context
  • Invalid Queries: Validates query length and emptiness

Testing

Run the test suite:

# Run all tests
cargo test -p grepapp_haystack

# Run with output
cargo test -p grepapp_haystack -- --nocapture

# Run specific test
cargo test -p grepapp_haystack test_search_success

Examples

Search for Error Handling Patterns

let haystack = GrepAppHaystack::with_filters(
    Some("Rust".to_string()),
    None,
    None,
)?;

let query = SearchQuery {
    search_term: "Result<T, E>".into(),
    ..Default::default()
};

let documents = haystack.search(&query).await?;

Find Specific Function Implementations

let haystack = GrepAppHaystack::with_filters(
    Some("Go".to_string()),
    Some("kubernetes/kubernetes".to_string()),
    Some("pkg/".to_string()),
)?;

let query = SearchQuery {
    search_term: "func NewController".into(),
    ..Default::default()
};

let documents = haystack.search(&query).await?;

Limitations

  • Rate Limits: grep.app enforces rate limits on API requests
  • No Authentication: grep.app API currently doesn't require authentication
  • Public Repositories Only: Only searches public GitHub repositories
  • No Regex Support: Search is text-based, not regex-based (though grep.app may support some patterns)

API Reference

grep.app uses the following API endpoint:

  • Endpoint: https://grep.app/api/search
  • Method: GET
  • Parameters: q, f.lang, f.repo, f.path
  • Response: JSON with facets and hits

For more details, see the models.rs file for the complete response structure.

Contributing

When extending this crate:

  1. Add tests for new functionality
  2. Update this README with new features
  3. Follow Rust naming conventions (snake_case)
  4. Use tracing for logging, not println!

License

MIT

Links

Commit count: 1532

cargo fmt