paddler

Crates.io	paddler
lib.rs	paddler
version	2.1.1
created_at	2024-11-21 11:49:38.748755+00
updated_at	2025-08-18 09:32:12.141+00
description	Open-source LLMOps platform for hosting and scaling AI in your own infrastructure
homepage
repository	https://github.com/intentee/paddler
max_upload_size
id	1456059
size	878,639

Mateusz Charytoniuk (mcharytoniuk)

documentation

https://paddler.intentee.com

README

Paddler

Digital products and their users need privacy, reliability, cost control and an option to be independent from third party vendors.

Paddler is an open-source LLMOps platform for organizations that host and scale open-source models in their own infrastructure.

Key features

Inference through a built-in llama.cpp engine
Load balancing
Works through agents that can be added dynamically, allowing integration with autoscaling tools
Request buffering, enabling scaling from zero hosts
Built-in web admin panel for management, monitoring and testing
Observability metrics

For whom?

Product teams that need LLM inference and embeddings in their features
DevOps/LLMOps teams that need to run and deploy LLMs at scale
Organizations handling sensitive data with high compliance and privacy requirements (medical, financial, etc.)
Organizations wanting to achieve predictable LLM costs instead of being exposed to per-token pricing
Product leaders who need reliable model performance to maintain consistent user experience of their AI-based features

Documentation

Visit our documentation page to install Paddler and get started with it.

API documentation is also available.

Installation

There are multiple ways to install Paddler, but the goal is to obtain the paddler binary and make it available in your system.

You can:

Option 1: Download the latest release from our GitHub releases
Option 2: Build Paddler from source

Using Paddler

The entire Paddler functionality is available through the paddler command.

You can run paddler --help to see the available commands and options.

Read more about installation and initial setup

How does it work?

Paddler is built for an easy set up. It comes as a self-contained binary with only two deployable components, the balancer and the agents.

The balancer exposes the following:

Inference service (used by applications that connect to it to obtain tokens or embeddings)
Management service, which manages the Paddler's setup internally
Web admin panel that lets you view and test your Paddler setup

Agents are usually deployed on separate instances. They further distribute the incoming requests to slots, which are responsible for generating tokens and embeddings.

Paddler uses a built-in llama.cpp engine for inference, but has its own implementation of llama.cpp slots which keep their own context and KV cache.

Web admin panel

Paddler comes with a built-in web admin panel.

You can use it to monitor your Paddler fleet: paddler-web-admin-panel

Add and update your model and customize the chat template and inference parameters: paddler-model

And use GUI to test the inference: paddler-prompt

Starting out

Why the Name

We initially wanted to use Raft consensus algorithm (thus Paddler, because it paddles on a Raft), but eventually dropped that idea. The name stayed, though.

Later, people started sending us the "that's a paddlin'" clip from The Simpsons, and we just embraced it.

Community and contributions

We keep everything simple and on GitHub. Please use GitHub discussions for community conversations, and feel free to contribute.

Commit count: 0