| Crates.io | offline-intelligence |
| lib.rs | offline-intelligence |
| version | 0.1.1 |
| created_at | 2026-01-24 10:42:06.1614+00 |
| updated_at | 2026-01-24 10:42:06.1614+00 |
| description | High-performance LLM inference engine with memory management - Cross-platform native library with bindings for Python, Java, C++, and JavaScript |
| homepage | |
| repository | https://github.com/OfflineIntelligence/offline-intelligence |
| max_upload_size | |
| id | 2066519 |
| size | 371,020 |
High-performance LLM inference engine with advanced memory management and context orchestration capabilities. Built in Rust for maximum performance across Windows, macOS, and Linux platforms.
This project follows a dual-licensing model:
context_engine)cache_management)| Platform | Architecture | Status |
|---|---|---|
| Windows | x86_64 | ✅ Supported |
| macOS | x86_64, ARM64 | ✅ Supported |
| Linux | x86_64, ARM64 | ✅ Supported |
The library provides native bindings for multiple languages:
Direct access to all core functionality:
use offline_intelligence::{Config, run_server};
let config = Config::from_env()?;
run_server(config).await?;
Install via pip:
pip install offline-intelligence
Usage:
from offline_intelligence import Config, run_server
config = Config.from_env()
run_server(config)
CMake integration:
#include <offline_intelligence/offline_intelligence.h>
auto config = offline_intelligence::Config::from_env();
offline_intelligence::run_server(config);
NPM package:
npm install offline-intelligence
Usage:
const { Config, runServer } = require('offline-intelligence');
const config = Config.fromEnv();
runServer(config);
Maven dependency:
<dependency>
<groupId>com.offlineintelligence</groupId>
<artifactId>offline-intelligence-java</artifactId>
<version>0.1.0</version>
</dependency>
Usage:
import com.offlineintelligence.Config;
import com.offlineintelligence.Server;
Config config = Config.fromEnv();
Server.runServer(config);
build.bat
chmod +x build.sh
./build.sh
The build process creates distribution packages in the dist/ directory:
rust/ - Native Rust binariespython/ - Python wheelscpp-lib/ - C++ libraries and headersjavascript/ - Node.js packagesjava/ - Java JAR filesThe library uses environment variables for configuration:
# Core settings
LLAMA_BIN=/path/to/llama-server
MODEL_PATH=/path/to/model.gguf
API_HOST=127.0.0.1
API_PORT=8000
# Resource allocation
THREADS=auto
GPU_LAYERS=auto
CTX_SIZE=auto
BATCH_SIZE=auto
POST /generate/stream - Stream generationGET /healthz - Health checkGET /readyz - Readiness checkGET /metrics - Prometheus metricsGET /admin/status - System statusPOST /admin/load - Load modelPOST /admin/stop - Stop backendGET /memory/stats/{session_id} - Memory statisticsPOST /memory/optimize - Optimize memoryPOST /memory/cleanup - Cleanup memoryWe welcome contributions to the open-source core components. Please see our Contributing Guide for details.
For support, please open an issue on our GitHub repository or contact our team at support@offlineintelligence.com.