es-disk-planner

Crates.ioes-disk-planner
lib.rses-disk-planner
version0.1.0
created_at2025-10-16 14:55:46.819931+00
updated_at2025-10-16 14:55:46.819931+00
descriptionA CLI and library to estimate Elasticsearch cluster disk capacity.
homepage
repositoryhttps://github.com/cdelmonte-zg/es-disk-planner
max_upload_size
id1886250
size25,807
Christian Del Monte (cdelmonte-zg)

documentation

https://docs.rs/es-disk-planner

README

Elasticsearch Disk Capacity Planner

This tool estimates the disk capacity requirements for an Elasticsearch cluster, taking into account:

  • Primary and replica shards
  • Lucene merge overhead
  • Headroom for operational safety (disk watermarks and ingestion bursts)
  • Relocation buffer per node
  • Target maximum disk utilization

The output helps plan realistic disk sizes per node and total cluster capacity.


🔢 Calculation Model

base = primaries * shard_size_gb * (1 + replicas)
with_merge = base * (1 + overhead_merge)
with_headroom = with_merge * (1 + headroom)
buffer_total = buffer_per_node_gb * nodes
total_cluster = with_headroom + buffer_total
per_node = total_cluster / nodes
disk_per_node = per_node / target_utilization

Parameters

Parameter Default Description
--shard_size_gb 50 Average size of a single shard on disk (compressed Lucene data)
--overhead_merge 0.20 (20%) Temporary space required by Lucene segment merges
--headroom 0.30 (30%) Safety margin to stay below disk watermarks (85–90%)
--buffer_per_node_gb = shard_size_gb Space reserved per node for shard relocation/rebalancing
--target_utilization 0.75 (75%) Maximum desired disk usage ratio

📊 Example Output

=== Elasticsearch Disk Capacity Planner ===
Nodes: 5
Primary shards: 10
Replicas per shard: 1
Shard size: 50.0 GB | Overhead merge: 20% | Headroom: 30%
Relocation buffer per node: 50.0 GB
Target disk utilization: 75%

Base (primaries+replicas): 1000.0 GB (1.00 TB)
+ Overhead merge:        1200.0 GB (1.20 TB)
+ Headroom:              1560.0 GB (1.56 TB)
+ Total buffer:          250.0 GB  (0.25 TB)
= Cluster total:         1810.0 GB (1.81 TB)

Per node (recommended):  362.0 GB (0.36 TB)
Disk per node @ <75%:    482.7 GB (0.48 TB)

💡 Interpretation

  • Base (primaries + replicas) — total indexed data size on disk.
  • Merge overhead — extra space required during Lucene segment merges.
  • Headroom — operational slack to avoid hitting high/flood-stage watermarks.
  • Buffer per node — space required to receive the largest shard during relocation.
  • Target utilization — desired maximum disk usage (usually 70–80%).

This model provides an approximate but operationally safe estimation for capacity planning in Elasticsearch clusters.


🧮 Example Scenario

  • 5 data nodes
  • 10 primary shards
  • 1 replica per shard
  • 50 GB average shard size
  • 20% merge overhead
  • 30% headroom
  • 50 GB relocation buffer per node
  • 75% target disk utilization

Result: ~1.8 TB total cluster capacity, or ~480–500 GB per node to stay below 75% usage.


⚙️ Operational Notes

  • The results refer to disk usage, not JVM heap or RAM.

  • Typical Elasticsearch node sizing guidelines:

    • JVM heap ≤ 30 GB
    • Node memory ≥ 64 GB (≈ 50% heap, 50% OS file cache)
    • Shard size 20–50 GB
  • The model aligns with Elastic’s published best practices.


⚠️ Limitations

  • Assumes uniform shard sizes and compression ratios.
  • Does not include local snapshots or external repository overhead.
  • Does not model “cold” or “frozen” tiers.
  • Merge and headroom factors are static (simplified estimation).

🧰 Usage Examples

# Default example
cargo run -- \
  --nodes 5 \
  --primaries 10 \
  --replicas 1 \
  --shard_size_gb 50 \
  --overhead_merge 0.20 \
  --headroom 0.30 \
  --target_utilization 0.75

# Two replicas, larger shards
cargo run -- --nodes 5 --primaries 10 --replicas 2 --shard_size_gb 80

# Conservative watermarks
cargo run -- --nodes 6 --headroom 0.4 --target_utilization 0.65

🧩 Future Improvements

  • Add --units iec to switch between GB/TB (1000-based) and GiB/TiB (1024-based)
  • Support JSON/CSV output for pipeline integration
  • Optional retrieval of real shard stats from _cat/shards via REST API
  • Integrate into a monitoring workflow for continuous capacity validation
Commit count: 0

cargo fmt