api_huggingface

Crates.ioapi_huggingface
lib.rsapi_huggingface
version0.3.0
created_at2025-11-06 19:03:04.371462+00
updated_at2025-11-29 20:43:16.591298+00
descriptionHuggingFace's API for accessing large language models (LLMs) and embeddings.
homepagehttps://github.com/Wandalen/api_llm/tree/master/api/huggingface
repositoryhttps://github.com/Wandalen/api_llm/tree/master/api/huggingface
max_upload_size
id1920126
size1,485,477
Wandalen (Wandalen)

documentation

https://docs.rs/api_huggingface

README

api_huggingface

stable

Comprehensive Rust client for HuggingFace Inference API with Router API support for Pro models.

🎯 Architecture: Stateless HTTP Client

This API crate is designed as a stateless HTTP client with zero persistence requirements. It provides:

  • Direct HTTP calls to the HuggingFace Inference API
  • In-memory operation state only (resets on restart)
  • No external storage dependencies (databases, files, caches)
  • No configuration persistence beyond environment variables

This ensures lightweight, containerized deployments and eliminates operational complexity.

🏛️ Governing Principle: "Thin Client, Rich API"

Expose all server-side functionality transparently while maintaining zero client-side intelligence or automatic behaviors.

Key principles:

  • API Transparency: One-to-one mapping with HuggingFace API endpoints
  • Zero Automatic Behavior: No implicit decision-making or magic thresholds
  • Explicit Control: Developer decides when, how, and why operations occur
  • Configurable Reliability: Enterprise features available through explicit configuration

Scope

In Scope

  • Text generation via Router API (Llama-3, Mistral, Kimi-K2)
  • Embeddings generation with similarity calculations
  • Model discovery and status checking
  • Streaming responses (SSE)
  • Vision APIs (classification, detection, captioning)
  • Audio APIs (ASR, TTS, classification, transformation)
  • Enterprise reliability (circuit breaker, rate limiting, failover, health checks)
  • Synchronous API wrapper

Out of Scope

  • Model training (inference only)
  • File upload/download (text-based API interactions only)
  • Custom model hosting (HuggingFace hosted models only)
  • GraphQL support (REST API only)

Features

Core Capabilities:

  • Router API for Pro plan models (OpenAI-compatible format)
  • Text generation with streaming support
  • Embeddings with similarity calculations
  • Model availability checking

Multimodal Features:

  • Vision: Image classification, object detection, captioning
  • Audio: Speech recognition, text-to-speech, classification

Enterprise Reliability:

  • Circuit breaker pattern for failure detection
  • Rate limiting with token bucket algorithm
  • Multi-endpoint failover (4 strategies)
  • Background health checks
  • Dynamic configuration with runtime updates
  • Performance metrics tracking
  • LRU caching with TTL

Installation

Add to your Cargo.toml:

[dependencies]
api_huggingface = "0.2.0"

Quick Start

Basic Usage

use api_huggingface::
{
  Client,
  environment::HuggingFaceEnvironmentImpl,
  components::{ input::InferenceParameters, models::Models },
  secret::Secret,
};

#[ tokio::main ]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let api_key = Secret::load_from_env( "HUGGINGFACE_API_KEY" )?;
  let env = HuggingFaceEnvironmentImpl::build( api_key, None )?;
  let client = Client::build( env )?;

  let params = InferenceParameters::new()
    .with_temperature( 0.7 )
    .with_max_new_tokens( 100 );

  let response = client.inference()
    .create_with_parameters
    (
      "What is the capital of France?",
      Models::llama_3_1_8b_instruct(),
      params
    )
    .await?;

  println!( "Response: {:?}", response );
  Ok( () )
}

Embeddings with Similarity

use api_huggingface::{ Client, environment::HuggingFaceEnvironmentImpl, secret::Secret };

#[ tokio::main ]
async fn main() -> Result< (), Box< dyn std::error::Error > >
{
  let api_key = Secret::load_from_env( "HUGGINGFACE_API_KEY" )?;
  let env = HuggingFaceEnvironmentImpl::build( api_key, None )?;
  let client = Client::build( env )?;

  let embedding1 = client.embeddings().create( "Hello world" ).await?;
  let embedding2 = client.embeddings().create( "Hi there" ).await?;

  let similarity = client.embeddings().similarity( &embedding1, &embedding2 )?;
  println!( "Similarity: {:.4}", similarity );

  Ok( () )
}

Authentication

Option 1: Workspace Secret (Recommended)

Create secret/-secrets.sh in your workspace root:

#!/bin/bash
export HUGGINGFACE_API_KEY="hf_your-key-here"

Option 2: Environment Variable

export HUGGINGFACE_API_KEY="hf_your-key-here"

Get your API key from huggingface.co/settings/tokens.

Feature Flags

Core Features

  • default - Core async inference and embeddings
  • inference - Text generation API
  • embeddings - Embeddings generation
  • models - Model discovery and status

Streaming and Processing

  • inference-streaming - SSE streaming support
  • embeddings-similarity - Similarity calculations
  • embeddings-batch - Batch processing

Enterprise Reliability

  • circuit-breaker - Failure detection and recovery
  • rate-limiting - Token bucket rate limiting
  • failover - Multi-endpoint failover
  • health-checks - Background health monitoring
  • dynamic-config - Runtime configuration

Client Enhancements

  • sync - Synchronous API wrappers
  • caching - LRU caching with TTL
  • performance-metrics - Request tracking

Presets

  • full - All features enabled
  • integration - Integration tests with real API

Testing

Test Coverage

  • Comprehensive unit and integration tests
  • Real API integration tests (no mocking)
  • No-mockup policy: all tests use real HuggingFace API

Supported Models

Router API (Pro Plan)

Model Provider Capabilities
moonshotai/Kimi-K2-Instruct-0905 groq Chat completions
meta-llama/Llama-3.1-8B-Instruct various Chat completions
mistralai/Mistral-7B-Instruct various Chat completions
codellama/CodeLlama-34b-Instruct various Code generation

Legacy Inference API (Free Tier)

Model Task
facebook/bart-large-cnn Summarization
gpt2 Text generation
sentence-transformers/all-MiniLM-L6-v2 Embeddings

Documentation

Dependencies

  • reqwest: HTTP client with async support
  • tokio: Async runtime
  • serde: Serialization/deserialization
  • workspace_tools: Secret management
  • error_tools: Unified error handling
  • secrecy: Secure credential handling

All dependencies workspace-managed for consistency.

Contributing

  1. Follow established patterns in existing code
  2. Use 2-space indentation consistently
  3. Add tests for new functionality
  4. Update documentation for public APIs
  5. Ensure zero clippy warnings: cargo clippy -- -D warnings
  6. Follow zero-tolerance mock policy (real API integration only)
  7. Follow the "Thin Client, Rich API" principle

License

MIT

Links

Commit count: 0

cargo fmt