reflex-server

Crates.io	reflex-server
lib.rs	reflex-server
version	0.2.2
created_at	2025-12-16 11:45:33.772191+00
updated_at	2025-12-29 01:47:01.699175+00
description	OpenAI-compatible HTTP gateway for Reflex cache
homepage
repository	https://github.com/rawcontext/reflex
max_upload_size
id	1987591
size	282,447

Chris Cheney (ccheney)

documentation

https://docs.rs/reflex-server

README

reflex-server

reflex-server owns the HTTP gateway (Axum) and builds the reflex binary.

It depends on the core library crate (reflex-cache) for the cache tiers, storage, embeddings, and vector DB integration.

Run (Dev)

# Requires Qdrant running (gRPC on 6334)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

RUST_LOG=debug cargo run -p reflex-server

Run (Release)

cargo run -p reflex-server --release

Docker Compose

From repo root:

docker compose up -d

Endpoints

GET /healthz
GET /ready
POST /v1/chat/completions (OpenAI-compatible)

Response Status

Responses include X-Reflex-Status:

hit-l1-exact: exact request match
hit-l3-verified: semantic hit verified by L3
miss: forwarded to provider and stored

The response choices[].message.content is Tauq-encoded.

Configuration

Most commonly used env vars:

Variable	Default	Notes
`REFLEX_PORT`	`8080`	HTTP port
`REFLEX_BIND_ADDR`	`127.0.0.1`	Bind address
`REFLEX_QDRANT_URL`	`http://localhost:6334`	Qdrant gRPC
`REFLEX_STORAGE_PATH`	`./.data`	Storage base path
`REFLEX_L1_CAPACITY`	`10000`	L1 capacity
`REFLEX_MODEL_PATH`	(unset)	Unset = stub embedder
`REFLEX_RERANKER_PATH`	(unset)	Optional reranker
`REFLEX_RERANKER_THRESHOLD`	`0.70`	L3 threshold
`REFLEX_MOCK_PROVIDER`	(unset)	Set to bypass real provider calls

Point Your Agent

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=sk-your-key

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-your-key")
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quicksort"}],
)

GPU Features

Build/run with one of:

--features metal (Apple Silicon)
--features cuda (NVIDIA)
--features cpu (docs.rs-style CPU flag; usually you can omit)

Example:

cargo run -p reflex-server --release --no-default-features --features metal

Tests

cargo test -p reflex-server

Real integration tests (needs Qdrant):

REFLEX_QDRANT_URL=http://localhost:6334 cargo test -p reflex-server --test integration_real

Commit count: 0

reflex-server

documentation

README

reflex-server

Run (Dev)

Run (Release)

Docker Compose

Endpoints

Response Status

Configuration

Point Your Agent

Python (openai SDK)

GPU Features

Tests

cargo fmt