llm-kit-groq

Crates.iollm-kit-groq
lib.rsllm-kit-groq
version0.1.0
created_at2026-01-18 20:38:07.04793+00
updated_at2026-01-18 20:38:07.04793+00
descriptionGroq provider implementation for the LLM Kit - supports chat and transcription models
homepage
repositoryhttps://github.com/saribmah/llm-kit
max_upload_size
id2053052
size177,179
Sarib Mahmood (saribmah)

documentation

README

LLM Kit Groq

Groq provider for LLM Kit - Ultra-fast LLM inference with open-source models powered by Groq's LPU architecture.

Note: This provider uses the standardized builder pattern. See the Quick Start section for the recommended usage.

Features

  • Text Generation: Generate text with Llama, Gemma, DeepSeek, Qwen, and other open-source models
  • Streaming: Real-time response streaming with ultra-fast inference speeds
  • Tool Calling: Support for function calling with chat models
  • Speech Generation: Text-to-speech with PlayAI models
  • Transcription: Audio-to-text with Whisper models (large-v3, large-v3-turbo, distilled)
  • Ultra-Fast Inference: Groq's LPU architecture delivers industry-leading inference speeds
  • Provider Metadata: Access cached token counts and performance metrics

Installation

Add this to your Cargo.toml:

[dependencies]
llm-kit-groq = "0.1"
llm-kit-core = "0.1"
llm-kit-provider = "0.1"
tokio = { version = "1", features = ["full"] }

Quick Start

Using the Client Builder (Recommended)

use llm_kit_groq::GroqClient;
use llm_kit_provider::language_model::LanguageModel;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create provider using the client builder
    let provider = GroqClient::new()
        .api_key("your-api-key")  // Or use GROQ_API_KEY env var
        .build();
    
    // Create a language model
    let model = provider.chat_model("llama-3.1-8b-instant");
    
    println!("Model: {}", model.model_id());
    println!("Provider: {}", model.provider());
    Ok(())
}

Using Settings Directly (Alternative)

use llm_kit_groq::{GroqProvider, GroqProviderSettings};
use llm_kit_provider::language_model::LanguageModel;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create provider with settings
    let provider = GroqProvider::new(GroqProviderSettings::default());
    
    let model = provider.chat_model("llama-3.1-8b-instant");
    
    println!("Model: {}", model.model_id());
    Ok(())
}

Configuration

Environment Variables

Set your Groq API key as an environment variable:

export GROQ_API_KEY=your-api-key
export GROQ_BASE_URL=https://api.groq.com/openai/v1  # Optional

Using the Client Builder

use llm_kit_groq::GroqClient;

let provider = GroqClient::new()
    .api_key("your-api-key")
    .base_url("https://api.groq.com/openai/v1")
    .header("Custom-Header", "value")
    .name("my-groq-provider")
    .build();

Using Settings Directly

use llm_kit_groq::{GroqProvider, GroqProviderSettings};

let settings = GroqProviderSettings::new()
    .with_api_key("your-api-key")
    .with_base_url("https://api.groq.com/openai/v1")
    .add_header("Custom-Header", "value")
    .with_name("my-groq-provider");

let provider = GroqProvider::new(settings);

Builder Methods

The GroqClient builder supports:

  • .api_key(key) - Set the API key (overrides GROQ_API_KEY environment variable)
  • .load_api_key_from_env() - Explicitly load API key from GROQ_API_KEY environment variable
  • .base_url(url) - Set custom base URL (default: https://api.groq.com/openai/v1)
  • .name(name) - Set provider name (optional)
  • .header(key, value) - Add a single custom header
  • .headers(map) - Add multiple custom headers
  • .build() - Build the provider

Supported Models

All Groq models are supported across multiple model families.

Chat Models

  • Llama: llama-3.1-8b-instant, llama-3.3-70b-versatile, meta-llama/llama-guard-4-12b, meta-llama/llama-4-maverick-17b-128e-instruct, meta-llama/llama-4-scout-17b-16e-instruct
  • Gemma: gemma2-9b-it
  • DeepSeek: deepseek-r1-distill-llama-70b
  • Qwen: qwen/qwen3-32b
  • Mixtral: mixtral-8x7b-32768

Transcription Models (Whisper)

  • whisper-large-v3 - Most accurate Whisper model
  • whisper-large-v3-turbo - Faster Whisper variant with lower latency
  • distil-whisper-large-v3-en - English-optimized distilled model for faster performance

Text-to-Speech Models

  • playai-tts - PlayAI text-to-speech model

For a complete list of available models, see the Groq Models documentation.

Provider-Specific Options

Groq supports advanced features through provider options.

Reasoning Format

Control how reasoning content is returned:

use llm_kit_core::GenerateText;
use serde_json::json;

let result = GenerateText::new(model, prompt)
    .provider_options(json!({
        "reasoningFormat": "parsed"  // "parsed", "raw", or "hidden"
    }))
    .execute()
    .await?;

Reasoning Effort

Configure computational effort for reasoning models:

use llm_kit_core::GenerateText;
use serde_json::json;

let result = GenerateText::new(model, prompt)
    .provider_options(json!({
        "reasoningEffort": "high"  // Effort level as string
    }))
    .execute()
    .await?;

Parallel Tool Calls

Enable or disable parallel tool execution (default: true):

use llm_kit_core::GenerateText;
use serde_json::json;

let result = GenerateText::new(model, prompt)
    .tools(tools)
    .provider_options(json!({
        "parallelToolCalls": true
    }))
    .execute()
    .await?;

Service Tier

Select the service tier for processing:

use llm_kit_core::GenerateText;
use serde_json::json;

let result = GenerateText::new(model, prompt)
    .provider_options(json!({
        "serviceTier": "flex"  // "on_demand", "flex", or "auto"
    }))
    .execute()
    .await?;

Cached Tokens Metadata

Groq provides metadata about cached tokens to help optimize performance:

use llm_kit_groq::GroqClient;
use llm_kit_core::{GenerateText, Prompt};

let provider = GroqClient::new().build();
let model = provider.chat_model("llama-3.1-8b-instant");

let result = GenerateText::new(model, Prompt::text("Hello!"))
    .execute()
    .await?;

// Access cache metadata
if let Some(metadata) = result.provider_metadata {
    if let Some(groq) = metadata.get("groq") {
        println!("Cached tokens: {:?}", groq.get("cachedTokens"));
    }
}

Available Provider Options

Option Type Description
reasoningFormat string Reasoning content format: "parsed", "raw", "hidden"
reasoningEffort string Computational effort level for reasoning
parallelToolCalls bool Enable parallel tool execution (default: true)
structuredOutputs bool Enable structured outputs (default: true)
serviceTier string Service tier: "on_demand", "flex", "auto"
user string End-user identifier for abuse monitoring

Examples

See the examples/ directory for complete examples:

  • chat.rs - Basic chat completion using do_generate() directly
  • stream.rs - Streaming responses using do_stream() directly
  • chat_tool_calling.rs - Tool calling using do_generate() directly
  • stream_tool_calling.rs - Streaming with tools using do_stream() directly
  • speech_generation.rs - Text-to-speech using do_generate() directly
  • transcription.rs - Audio transcription using do_transcribe() directly

Run examples with:

export GROQ_API_KEY="your-api-key"
cargo run --example chat
cargo run --example stream
cargo run --example chat_tool_calling
cargo run --example stream_tool_calling
cargo run --example speech_generation
cargo run --example transcription

Documentation

License

Licensed under:

Contributing

Contributions are welcome! Please see the Contributing Guide for more details.

Commit count: 299

cargo fmt