| Crates.io | nvidia-gpu-exporter |
| lib.rs | nvidia-gpu-exporter |
| version | 1.0.0 |
| created_at | 2025-11-26 23:45:34.611535+00 |
| updated_at | 2025-11-26 23:45:34.611535+00 |
| description | Prometheus exporter for NVIDIA GPUs using NVML |
| homepage | |
| repository | https://github.com/michaelnugent/nvidia-gpu-exporter |
| max_upload_size | |
| id | 1952705 |
| size | 99,046 |
A Prometheus exporter for NVIDIA GPUs using the NVIDIA Management Library (NVML)
The NVML shared library (libnvidia-ml.so.1) needs to be loadable. When running in a container it must be either baked in or mounted from the host.
If you are running a debian distro, you may encounter "cannot open shared object file" errors when libnvidia-ml1 package is installed. The issue is that libloading (used by nvml-wrapper) looks for libnvidia-ml.so but the older package only installs a symlink to libnvidia-ml.so.1.
sudo ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
This is a one-time setup that persists across reboots.
cargo build --release
./target/release/nvidia-gpu-exporter
--web-listen-address: Address to listen on for web interface and telemetry (default: 0.0.0.0:9445)--web-telemetry-path: Path under which to expose metrics (default: /metrics)Example:
./target/release/nvidia-gpu-exporter --web-listen-address 0.0.0.0:9445 --web-telemetry-path /metrics
cargo test
The exporter exposes the following Prometheus metrics:
nvidia_up - NVML Metric Collection Operational (1 = working, 0 = error)nvidia_driver_info{version="..."} - NVML driver version infonvidia_device_count - Count of NVIDIA GPU devices foundnvidia_info{index="...",minor="...",uuid="...",name="..."} - Device metadata (always 1)nvidia_temperatures{minor="..."} - GPU temperature in Celsiusnvidia_fanspeed{minor="..."} - Fan speed percentage (0-100)nvidia_memory_total{minor="..."} - Total memory in bytesnvidia_memory_used{minor="..."} - Used memory in bytesnvidia_utilization_memory{minor="..."} - Memory utilization percentage (0-100)nvidia_utilization_gpu{minor="..."} - Current GPU utilization percentage (0-100)nvidia_utilization_gpu_average{minor="..."} - GPU utilization averaged over 10s (0-100)nvidia_power_usage{minor="..."} - Current power usage in milliwattsnvidia_power_usage_average{minor="..."} - Power usage averaged over 10s in milliwattsnvidia_power_limit_milliwatts{minor="..."} - Current power management limit in milliwattsnvidia_power_limit_default_milliwatts{minor="..."} - Default power management limit in milliwattsnvidia_clock_graphics_mhz{minor="..."} - Current graphics clock speed in MHznvidia_clock_sm_mhz{minor="..."} - Current SM (Streaming Multiprocessor) clock speed in MHznvidia_clock_memory_mhz{minor="..."} - Current memory clock speed in MHznvidia_clock_graphics_max_mhz{minor="..."} - Maximum graphics clock speed in MHznvidia_clock_sm_max_mhz{minor="..."} - Maximum SM clock speed in MHznvidia_clock_memory_max_mhz{minor="..."} - Maximum memory clock speed in MHznvidia_performance_state{minor="..."} - Current P-State (0-15, where 0 is maximum performance)nvidia_pcie_link_generation{minor="..."} - Current PCIe link generation (1-4+)nvidia_pcie_link_width{minor="..."} - Current PCIe link width (number of lanes)nvidia_pcie_tx_throughput_kb{minor="..."} - PCIe transmit throughput in KB/snvidia_pcie_rx_throughput_kb{minor="..."} - PCIe receive throughput in KB/snvidia_encoder_utilization{minor="..."} - Video encoder utilization percentage (0-100)nvidia_decoder_utilization{minor="..."} - Video decoder utilization percentage (0-100)nvidia_ecc_errors_corrected_total{minor="..."} - Total corrected ECC errors (lifetime)nvidia_ecc_errors_uncorrected_total{minor="..."} - Total uncorrected ECC errors (lifetime)nvidia_compute_processes{minor="..."} - Number of compute processes currently running on the GPUnvidia_graphics_processes{minor="..."} - Number of graphics processes currently running on the GPUminor which is the GPU's minor device number0MIT