Crates.io | s3-algo |
lib.rs | s3-algo |
version | 0.7.0 |
source | src |
created_at | 2020-02-25 14:45:13.007601 |
updated_at | 2024-02-01 14:42:33.733742 |
description | High-performance algorithms for batch operations to Amazon S3 |
homepage | |
repository | https://github.com/openanalytics/s3-algo |
max_upload_size | |
id | 212418 |
size | 116,833 |
s3-algo
High-performance algorithms for batch operations in Amazon S3, on top of rusoto. Reliability and performance achieved through a configurable timeout/retry/backoff algorithm, for high volumn of requests. Monitor progress closely with closures that get called for every finished request, for accurate user feedback.
https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance-guidelines.html
s3_upload_files
.s3_list_objects
or s3_list_prefix
, and then execute deletion or copy on all the files.This crate is only in its infancy, and we happily welcome PR's, feature requests, suggestions for improvement of the API.
Both tests and examples require that an S3 service such as minio
is running locally at port 9000.
Tests assume that a credentials profile exists - for example in ~/.aws/credentials
:
[testing]
aws_access_key_id = 123456789
aws_secret_access_key = 123456789
Is all done with entrypoint s3_list_objects()
or s3_list_prefix()
, which return a ListObjects
object which can delete and copy files.
Example:
s3_list_prefix(s3, "test-bucket".to_string(), "some/prefix".to_string())
.delete_all()
.await
.unwrap();
s3_upload_files
functionThe documentation for UploadConfig
may help illuminate the components of the algorithm.
The currnetly most important aspect of the algorithm revolves around deciding timeout values. That is, how long to wait for a request before trying again.
It is important for performance that the timeout is tight enough.
The main mechanism to this end is the estimation of the upload bandwidth through a running exponential average of the upload speed (on success) of individual files.
Additionally, on each successive retry, the timeout increases by some factor (back-off).
Is the algorithm considerate with respect to other processes that want to use the same network? For example in the case of congestion. It does implement increasing back-off intervals after failed requests, but the real effect on a shared network should be tested.
perf_data
Command-line interface for uploading any directory to any bucket and prefix in a locally running S3 service (such as minio
).
Example:
cargo run --example perf_data -- -n 3 ./src test-bucket lala
Prints:
attempts bytes success_ms total_ms MBps MBps est
1 1990 32 32 0.06042 1.00000
1 24943 33 33 0.74043 1.00000
1 2383 29 29 0.08211 1.00000
1 417 13 13 0.03080 1.00000
1 8562 16 16 0.51480 1.00000
total_ms
is the total time including all retries, and success_ms
is the time of only the last attempt.
The distinction between these two is useful in real cases where attempts
is not always 1
.
You can then verify that the upload happened by entering the container. Something like:
$ docker exec -it $(docker ps --filter "ancestor=minio" --format "{{.Names}}") bash
[user@144aff4dae5b ~]$ ls s3/
test-bucket/
[user@144aff4dae5b ~]$ ls s3/test-bucket/
lala