| Crates.io | with-gpu |
| lib.rs | with-gpu |
| version | 0.3.0 |
| created_at | 2025-11-20 06:46:41.227968+00 |
| updated_at | 2025-12-02 14:08:22.670741+00 |
| description | Intelligent GPU selection wrapper for CUDA commands |
| homepage | https://github.com/osteele/with-gpu |
| repository | https://github.com/osteele/with-gpu |
| max_upload_size | |
| id | 1941450 |
| size | 65,682 |
Intelligent GPU selection wrapper for CUDA commands. Automatically selects GPUs with the most available memory, then sets CUDA_VISIBLE_DEVICES and executes your command.
Install from crates.io:
cargo install with-gpu
This installs with-gpu to ~/.cargo/bin/with-gpu (ensure ~/.cargo/bin is in your PATH).
git clone https://github.com/osteele/with-gpu.git
cd with-gpu
cargo install --path .
Select the GPU with most available memory:
with-gpu python train.py
This prioritizes available VRAM over idle status, preventing OOM errors.
Specify exact GPU ID(s):
# Single GPU
with-gpu --gpu 1 python train.py
# Multiple GPUs
with-gpu --gpu 0,1 python train.py
with-gpu --gpu 0,1,2,3 torchrun --nproc_per_node=4 train.py
Request a range of GPUs:
# Need exactly 2 GPUs
with-gpu --min-gpus 2 --max-gpus 2 python train.py
# Want 1-4 GPUs (use as many idle as available, up to 4)
with-gpu --max-gpus 4 python train.py
# Need at least 2, prefer up to 4
with-gpu --min-gpus 2 --max-gpus 4 python train.py
Enforce idle-only selection (no non-idle GPUs even if they have more free memory):
# Single idle GPU required
with-gpu --require-idle python train.py
# Multiple idle GPUs required
with-gpu --min-gpus 2 --require-idle python train.py
Note: Without --require-idle, the tool selects GPUs by available memory regardless of idle status. Use this flag when you specifically need GPUs with 0 running processes.
Filter GPUs by available memory and utilization:
# Require at least 8 GB free memory (default is 2 GB)
with-gpu --min-memory 8000 python train.py
# Allow any GPU with free memory (disable 2 GB default)
with-gpu --min-memory 0 python small_inference.py
# Require GPU utilization below 70%
with-gpu --max-util 70 python train.py
# Combine thresholds: 16 GB free + max 50% utilization
with-gpu --min-memory 16000 --max-util 50 python train_llm.py
Default behavior: By default, with-gpu requires at least 2 GB free memory to prevent OOM errors. This is sufficient for PyTorch initialization and most models. For small jobs that need less, use --min-memory 0.
Ghost process detection: The idle detection uses a 500 MB threshold, which is sufficient for detecting processes that NVML missed (ghost processes with allocated memory).
Wait for GPUs to become available instead of failing immediately:
# Wait indefinitely for an idle GPU
with-gpu --wait python train.py
# Wait up to 300 seconds (5 minutes) for 2 idle GPUs
with-gpu --wait --timeout 300 --min-gpus 2 --require-idle python train.py
# Wait for 1-4 GPUs with 1 hour timeout
with-gpu --wait --timeout 3600 --max-gpus 4 python train.py
The tool polls every 5 seconds and shows:
View all GPUs and their current usage:
with-gpu --status
Output example:
Available GPUs:
GPU 0: USED - 15320/24268 MB (63.1%), 85 util, 3 processes
GPU 1: IDLE - 0/24268 MB (0.0%), 0 util, 0 processes
GPU 2: USED - 5920/24268 MB (24.4%), 12 util, 1 processes
In this example, auto-selection would pick GPU 1 (24 GB free), then GPU 2 (18 GB free), then GPU 0 (9 GB free).
--min-memory)--max-util)--require-idle: Only considers GPUs with 0 processes and <500 MB used (still sorted by available memory)--gpu: Bypasses auto-selection entirelyCUDA_VISIBLE_DEVICES and replaces current process with your commandWhy memory-first? A GPU with 10 GB free and 1 process is more useful than an "idle" GPU with 300 MB free. This prevents OOM errors that occurred with the old idle-first algorithm.
# Auto-select GPU with most free memory
with-gpu python train.py
# Force use of GPU 1
with-gpu --gpu 1 python train.py
# Use 2 GPUs with most free memory for distributed training
with-gpu --min-gpus 2 --max-gpus 2 torchrun --nproc_per_node=2 train.py
# Run multiple experiments on different GPUs
with-gpu --gpu 0 python experiment_a.py &
with-gpu --gpu 1 python experiment_b.py &
with-gpu --gpu 2 python experiment_c.py &
# Only run if a GPU is completely free
with-gpu --require-idle python long_training.py
# Use all available idle GPUs
with-gpu --max-gpus 8 python distributed_train.py
Works with any command that respects CUDA_VISIBLE_DEVICES:
cuda-selector - Python library for in-process GPU selection. Supports memory, power, temperature, and utilization criteria with custom ranking functions. For Python-only workflows where you want device selection within your script rather than as a CLI wrapper.
idlegpu - Simple shell utility returning idle GPU ID. No multi-GPU, fallback, or wait support.
gpustat / nvitop - Monitoring tools with rich status displays. Monitoring only, no command execution.
SLURM / Kubernetes - Enterprise job schedulers. Feature-rich but heavyweight, complex setup.
with-gpu?Fills the gap between simple utilities and full schedulers:
Best for: Individual workstations, small research groups, "just run this on the GPU with most free memory" workflows.
Mitigation: Use --require-idle, --wait, or stagger launches. See docs/limitations.md for detailed discussion.
When you need more: For guaranteed fair scheduling, priority queues, or resource reservations, use SLURM or Kubernetes.
Designed for cooperative environments (small groups, personal workstations) where "find me an idle GPU" is sufficient.
On Linux:
On macOS:
with-gpu in cross-platform scripts.See DEVELOPMENT.md for development documentation including:
See DESIGN.md for design rationale and architectural decisions.
See ROADMAP.md for planned features and future directions.
MIT
Oliver Steele steele@osteele.com