esdump-rs

Crates.io	esdump-rs
lib.rs	esdump-rs
version	0.1.1
source	src
created_at	2024-06-18 15:48:51.207016
updated_at	2024-06-18 15:58:09.533425
description	Dump Elasticsearch indexes to object storage
homepage
repository	https://github.com/GitGuardian/esdump-rs/
max_upload_size
id	1275608
size	883,769

Tom Forbes (orf)

documentation

README

esdump-rs

Dump Elasticsearch or OpenSearch indexes to blob storage, really-really fast :rocket:

Features:

Super-dooper fast
Supports compressing output with zstd or gzip
Natively supports blob storage on AWS, Google Cloud and Azure
Supports filtering and selecting specific fields
Detailed progress output and logging
Comes as a single, small static binary or a Docker image
Runs on Windows, Linux or MacOS
Written in Rust :crab:

Installation

Releases: Grab a pre-built executable from the releases page

Docker: docker run ghcr.io/gitguardian/esdump-rs:v0.1.0

Usage

Pass the Elasticsearch or OpenSearch HTTP(s) URL and a blob storage URL. Set the credentials in the environment (see example.env), and run!

$ esdump-rs http://localhost:9200 s3://es-dump/test/ \
  --index=test-index \
  --batches-per-file=5 \
  --batch-size=5000 \
  --concurrency=10

Settings such as the batch size and concurrency can be set as flags

Usage: esdump-rs [OPTIONS] --index <INDEX> --concurrency <CONCURRENCY> --batch-size <BATCH_SIZE> --batches-per-file <BATCHES_PER_FILE> <ELASTICSEARCH_URL> <OUTPUT_LOCATION>

Arguments:
  <ELASTICSEARCH_URL>  Elasticsearch cluster to dump
  <OUTPUT_LOCATION>    Location to write results. Can be a file://, s3:// or gs:// URL

Options:
  -i, --index <INDEX>
          Index to dump
  -c, --concurrency <CONCURRENCY>
          Number of concurrent requests to use
  -l, --limit <LIMIT>
          Limit the total number of records returned
  -b, --batch-size <BATCH_SIZE>
          Number of records in each batch
      --batches-per-file <BATCHES_PER_FILE>
          Number of batches to write per file
  -q, --query <QUERY>
          A file path containing a query to execute while dumping
  -f, --field <FIELD>
          Specific fields to fetch
      --compression <COMPRESSION>
          Compress the output files [default: zstd] [possible values: gzip, zstd]
      --concurrent-uploads <CONCURRENT_UPLOADS>
          Max chunks to concurrently upload *per task*
      --upload-size <UPLOAD_SIZE>
          Size of each uploaded [default: 15MB]
  -d, --distribution <DISTRIBUTION>
          Distribution of the cluster [possible values: elasticsearch, opensearch]
      --env-file <ENV_FILE>
          Distribution of the cluster [default: .env]
  -h, --help
          Print help
  -V, --version
          Print version

Commit count: 40

esdump-rs

documentation

README

esdump-rs

Installation

Usage

cargo fmt