datasaurust

Crates.iodatasaurust
lib.rsdatasaurust
version0.1.0
sourcesrc
created_at2023-04-10 18:21:07.689781
updated_at2023-04-10 18:21:07.689781
descriptionBlazingly fast implementation of the Datasaurus paper.
homepagehttps://github.com/araffin/datasaurust
repositoryhttps://github.com/araffin/datasaurust
max_upload_size
id835342
size48,122
Antonin RAFFIN (araffin)

documentation

https://docs.rs/datasaurust

README

CI

DatasauRust

Blazingly fast implementation of the Datasaurus paper (500x faster than the original): "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing" by Justin Matejka and George Fitzmaurice.

Usage

To run with plot -p (using gnuplot):

cargo run --release -- -d data/seed_datasets/Datasaurus_data.csv -p

With pre-defined shape:

cargo run --release -- -p -n 3000000 --decimals 2 --shape cat --allowed-distance 0.1

Starting from Gaussian noise:

cargo run --release -- -p -n 3000000 --decimals 2 --shape cat --allowed-distance 0.1 --gaussian

Create videos

Create video and gif (use --save-plot):

pip install moviepy ffmpeg-python

python scripts/create_video.py logs/cat/ logs/cat.mp4

From one shape to another:

cargo run --release -- -p -n 2000000 --decimals 1 --shape dog --allowed-distance 0.1 --log-interval 10000 -d logs/gaussian_cat/output.csv --save-plots

Note: The original datasets and python code comes from http://www.autodeskresearch.com/papers/samestats

Commit count: 39

cargo fmt