Crates.io | rrrs |
lib.rs | rrrs |
version | 0.1.3 |
source | src |
created_at | 2024-03-03 01:37:48.205016 |
updated_at | 2024-03-03 20:32:32.774473 |
description | Welcome to RRRS, a rapid, hyper-optimized CSV random sampling tool designed with performance and efficiency at its core. |
homepage | |
repository | https://github.com/ethan-wickstrom/rrrs/ |
max_upload_size | |
id | 1160159 |
size | 224,249 |
Welcome to RRRS, a rapid, hyper-optimized CSV random sampling tool designed with performance and efficiency at its core. Crafted meticulously in Rust, RRRS offers an unparalleled solution for extracting random data samples from CSV files swiftly and effortlessly.
Born out of a frustrating, repetitive process of sampling from unwieldy or enormous CSV files during my time at Washington University in St. Louis, RRRS (Rust(ic) Rapid Random Sampler) represents more than just a tool; it's a perhaps slightly redundant, but fun mission to over-optimize and speed up the all-too-familiar frustration of data sampling. As a student navigating the complex waters of data-heavy courses, I found myself constantly bogged down by the inefficiency of existing methods of importing massive datasets into spreadsheet software, waiting for them to load, and then struggling with plugins or scripting to extract the samples I needed. It was clear: there had to be a better way. So, instead of doing my homework, I work on this:
Enter RRRS. Developed with the speed and efficiency of Rust, RRRS is my answer to those frustrating hours. It's designed to make random sampling from large CSV files not just faster, but a seamless part of your workflow. This tool is for anyone who's ever felt this nuisance, turning what was once a bottleneck into a smooth, efficient process. With RRRS, I'm excited to share a solution that helped me and is now here to support data enthusiasts and professionals alike in their analytical endeavors.
To get started with RRRS, follow these simple steps:
rrrs -i <input_file_path> -o <output_file_path>
Upon execution, RRRS will prompt you to enter the desired number of rows to be randomly sampled from your CSV file. The output will be a new CSV file with the original file title and a suffix indicating the number of sampled rows (e.g., slogan_data-100
). This file will be saved in the execution path or a specified output directory.
Understand the organization of RRRS with the following directory structure:
rrrs/
โโโ Cargo.toml # Project manifest
โโโ src/ # Source files
โ โโโ main.rs # Entry point
โ โโโ library.rs # Library code
โ โโโ args.rs # Argument parsing
โ โโโ library/ # Library code
โ โโโ sampler_ops/ # Sampling operations
โ โ โโโ sampler_ops.rs # Sampling logic
โ โโโ csv_ops/ # CSV operations
โ โโโ csv_loader.rs # CSV loading functionality
โ โโโ csv_writer.rs # CSV writing functionality
โโโ tests/ # Automated tests
โโโ args_tests.rs # Tests for argument parsing
โโโ csv_loader_tests.rs # Tests for CSV loading
โโโ sampler_tests.rs # Tests for sampling logic
โโโ csv_writer_tests.rs # Tests for CSV writing
To use RRRS, you need to have Rust installed on your machine. If you don't have Rust installed, install it using the following command: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. For more information, refer to the official Rust installation guide here.
Once Rust is installed, you can install RRRS using the following command: cargo install rrrs
.
Note: RRRS is not yet supported on Windows. However, you can still use it by installing the Windows Subsystem for Linux.
To build RRRS from source, you can clone the repository and build it using the following commands (Note that this is primarily for development purposes):
git clone git@github.com:ethan-wickstrom/rrrs.git
cd rrrs
cargo build --release
cp target/release/rrrs /usr/local/bin
Contributions to RRRS are warmly welcomed. Feel free to open an issue or submit a pull request, whether it's bug reports, feature requests, or code contributions. Please refer to the contributing guidelines for more details.
RRRS is open-sourced under the Apache-2.0 license. See the LICENSE file for more details.