| Crates.io | apr-cli |
| lib.rs | apr-cli |
| version | 0.2.11 |
| created_at | 2025-12-17 00:29:25.633272+00 |
| updated_at | 2026-01-20 19:01:33.852424+00 |
| description | CLI tool for APR model inspection, debugging, and operations |
| homepage | |
| repository | https://github.com/paiml/aprender |
| max_upload_size | |
| id | 1988994 |
| size | 34,296,877 |
CLI tool for APR model inspection, debugging, and operations.
cargo install apr-cli
This installs the apr binary.
# Show help
apr --help
# Inspect a model
apr inspect model.apr
# List models in directory
apr list ./models/
# Interactive TUI mode
apr tui model.apr
# Compare two models
apr diff model1.apr model2.apr
Interactive chat with language models (supports APR, GGUF, SafeTensors):
# Chat with a GGUF model (GPU acceleration by default)
apr chat model.gguf
# Force CPU inference
apr chat model.gguf --no-gpu
# Explicitly request GPU acceleration
apr chat model.gguf --gpu
# Adjust generation parameters
apr chat model.gguf --temperature 0.7 --top-p 0.9 --max-tokens 512
Enable the inference feature to serve models via HTTP:
cargo install apr-cli --features inference
apr serve model.gguf --port 8080
The server provides an OpenAI-compatible API:
# Health check
curl http://localhost:8080/health
# Chat completions
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"default","messages":[{"role":"user","content":"Hello!"}],"max_tokens":50}'
# Streaming
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"default","messages":[{"role":"user","content":"Hello!"}],"stream":true,"max_tokens":50}'
Use the X-Trace-Level header to enable inference tracing for debugging:
# Brick-level tracing (token operations)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Trace-Level: brick" \
-d '{"model":"default","messages":[{"role":"user","content":"Hi"}],"max_tokens":5}'
# Step-level tracing (forward pass steps)
curl -H "X-Trace-Level: step" ...
# Layer-level tracing (per-layer timing)
curl -H "X-Trace-Level: layer" ...
Trace levels:
brick: Token-by-token operation timingstep: Forward pass steps (embed, attention, mlp, lm_head)layer: Per-layer timing breakdown (24+ layers)Enable CUDA support for NVIDIA GPUs:
cargo install apr-cli --features inference,cuda
# Run the tracing example
cargo run --example serve_with_tracing --features inference
MIT