| Crates.io | halldyll_starter_runpod |
| lib.rs | halldyll_starter_runpod |
| version | 0.2.0 |
| created_at | 2026-01-20 01:07:40.952764+00 |
| updated_at | 2026-01-20 01:37:26.780394+00 |
| description | Rust library for managing RunPod GPU pods - Provisioning, orchestration & state management |
| homepage | https://github.com/Mr-soloDev/halldyll-starter |
| repository | https://github.com/Mr-soloDev/halldyll-starter |
| max_upload_size | |
| id | 2055590 |
| size | 154,627 |
A comprehensive Rust library for managing RunPod GPU pods with automatic provisioning, state management, and orchestration.
.env file)[dependencies]
halldyll_starter_runpod = "0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
[dependencies]
halldyll_starter_runpod = { git = "https://github.com/Mr-soloDev/halldyll-starter" }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
Create a .env file in your project root:
# Required
RUNPOD_API_KEY=your_api_key_here
RUNPOD_IMAGE_NAME=runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel
# Optional - Pod Configuration
RUNPOD_POD_NAME=my-gpu-pod
RUNPOD_GPU_TYPE_IDS=NVIDIA A40
RUNPOD_GPU_COUNT=1
RUNPOD_CONTAINER_DISK_GB=20
RUNPOD_VOLUME_GB=50
RUNPOD_VOLUME_MOUNT_PATH=/workspace
RUNPOD_PORTS=22/tcp,8888/http
# Optional - Timeouts
RUNPOD_HTTP_TIMEOUT_MS=30000
RUNPOD_READY_TIMEOUT_MS=300000
RUNPOD_POLL_INTERVAL_MS=5000
# Optional - API URLs
RUNPOD_REST_URL=https://rest.runpod.io/v1
RUNPOD_GRAPHQL_URL=https://api.runpod.io/graphql
# Optional - Behavior
RUNPOD_RECONCILE_MODE=reuse
| Variable | Required | Default | Description |
|---|---|---|---|
RUNPOD_API_KEY |
✓ | - | RunPod API key |
RUNPOD_IMAGE_NAME |
✓ | - | Container image (e.g., runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel) |
RUNPOD_POD_NAME |
halldyll-pod |
Name for the pod | |
RUNPOD_GPU_TYPE_IDS |
NVIDIA A40 |
Comma-separated GPU types (e.g., NVIDIA A40,NVIDIA RTX 4090) |
|
RUNPOD_GPU_COUNT |
1 |
Number of GPUs | |
RUNPOD_CONTAINER_DISK_GB |
20 |
Container disk size in GB | |
RUNPOD_VOLUME_GB |
0 |
Persistent volume size (0 = no volume) | |
RUNPOD_VOLUME_MOUNT_PATH |
/workspace |
Mount path for persistent volume | |
RUNPOD_PORTS |
22/tcp,8888/http |
Exposed ports (format: port/protocol) |
|
RUNPOD_HTTP_TIMEOUT_MS |
30000 |
HTTP request timeout (ms) | |
RUNPOD_READY_TIMEOUT_MS |
300000 |
Pod ready timeout (ms) | |
RUNPOD_POLL_INTERVAL_MS |
5000 |
Poll interval for readiness (ms) | |
RUNPOD_RECONCILE_MODE |
reuse |
reuse or recreate existing pods |
The orchestrator uses the pod name to identify and reuse existing pods:
To run multiple pods simultaneously, simply use different names:
# Development pod
RUNPOD_POD_NAME=dev-pod
# Production pod
RUNPOD_POD_NAME=prod-pod
# ML training pod
RUNPOD_POD_NAME=training-pod
Each unique name creates a separate pod on RunPod.
The orchestrator provides the simplest way to get a ready-to-use pod:
use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load config from .env
let cfg = RunpodOrchestratorConfig::from_env()?;
let orchestrator = RunpodOrchestrator::new(cfg)?;
// Get a ready pod (creates, starts, or reuses as needed)
let pod = orchestrator.ensure_ready_pod().await?;
println!("Pod ready: {} at {}", pod.name, pod.public_ip);
// Get SSH connection info
if let Some((host, port)) = pod.ssh_endpoint() {
println!("SSH: ssh -p {} user@{}", port, host);
}
// Get Jupyter URL
if let Some(url) = pod.jupyter_endpoint() {
println!("Jupyter: {}", url);
}
Ok(())
}
use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodOrchestratorConfig::from_env()?;
let orchestrator = RunpodOrchestrator::new(cfg)?;
// Get a ready pod
let pod = orchestrator.ensure_ready_pod().await?;
println!("Pod running: {}", pod.id);
// Do your work...
// Stop the pod (keeps config, can restart later, stops billing)
orchestrator.stop_pod(&pod.id).await?;
println!("Pod stopped!");
// Or stop by name (uses RUNPOD_POD_NAME from .env)
// orchestrator.stop_current_pod().await?;
// Or terminate completely (deletes the pod)
// orchestrator.terminate(&pod.id).await?;
// orchestrator.terminate_current_pod().await?;
Ok(())
}
use halldyll_starter_runpod::{RunpodOrchestrator, RunpodOrchestratorConfig};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodOrchestratorConfig::from_env()?;
let orchestrator = RunpodOrchestrator::new(cfg)?;
let pod = orchestrator.ensure_ready_pod().await?;
println!("Pod running for max 1 hour...");
// Auto-stop after 1 hour
tokio::select! {
_ = tokio::time::sleep(Duration::from_secs(3600)) => {
println!("Timeout reached, stopping pod...");
orchestrator.stop_pod(&pod.id).await?;
}
// Or wait for your task to complete
// result = your_long_running_task() => { ... }
}
Ok(())
}
For direct pod creation:
use halldyll_starter_runpod::{RunpodProvisioner, RunpodProvisionConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodProvisionConfig::from_env()?;
let provisioner = RunpodProvisioner::new(cfg)?;
let pod = provisioner.create_pod().await?;
println!("Created pod: {}", pod.id);
Ok(())
}
For managing existing pods:
use halldyll_starter_runpod::{RunpodStarter, RunpodStarterConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodStarterConfig::from_env()?;
let starter = RunpodStarter::new(cfg)?;
// Start a pod
let status = starter.start("pod_id_here").await?;
println!("Pod status: {}", status.desired_status);
// Stop a pod
let status = starter.stop("pod_id_here").await?;
println!("Pod stopped: {}", status.desired_status);
Ok(())
}
For advanced operations:
use halldyll_starter_runpod::{RunpodClient, RunpodClientConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cfg = RunpodClientConfig::from_env()?;
let client = RunpodClient::new(cfg)?;
// List all pods
let pods = client.list_pods().await?;
for pod in pods {
println!("Pod: {} ({})", pod.id, pod.desired_status);
}
// List available GPU types
let gpus = client.list_gpu_types().await?;
for gpu in gpus {
println!("GPU: {} - Available: {}", gpu.display_name, gpu.available_count);
}
Ok(())
}
For persistent state and reconciliation:
use halldyll_starter_runpod::{RunPodState, JsonFileStateStore, PlannedAction};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let store = JsonFileStateStore::new("./pod_state.json");
// Load existing state
let mut state = store.load()?.unwrap_or_default();
// Record a pod
state.record_pod("pod-123", "my-pod", "runpod/pytorch:latest");
// Compute reconciliation plan
let action = state.reconcile("my-pod", "runpod/pytorch:latest");
match action {
PlannedAction::DoNothing(id) => println!("Pod {} is ready", id),
PlannedAction::Start(id) => println!("Need to start pod {}", id),
PlannedAction::Create => println!("Need to create new pod"),
}
// Save state
store.save(&state)?;
Ok(())
}
| Module | Description |
|---|---|
runpod_provisioner |
Create new pods via REST API |
runpod_starter |
Start/stop existing pods via REST API |
runpod_state |
State persistence and reconciliation |
runpod_client |
GraphQL client for advanced operations |
runpod_orchestrator |
High-level pod management |
Common GPU types available on RunPod:
| GPU | ID |
|---|---|
| NVIDIA A40 | NVIDIA A40 |
| NVIDIA A100 80GB | NVIDIA A100 80GB PCIe |
| NVIDIA RTX 4090 | NVIDIA GeForce RTX 4090 |
| NVIDIA RTX 3090 | NVIDIA GeForce RTX 3090 |
| NVIDIA L40S | NVIDIA L40S |
Use client.list_gpu_types() to get the full list with availability.
# Clone the project
git clone https://github.com/halldyll/starter.git
cd starter
# Create your .env file
cp .env.example .env
# Edit .env with your API key and settings
# Run the example
cargo run
# Debug build
cargo build
# Release build
cargo build --release
# Check without building
cargo check
# Run with all lints
cargo clippy -- -D warnings
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)MIT