Crates.io | datagen |
lib.rs | datagen |
version | 0.1.4 |
source | src |
created_at | 2019-06-05 07:06:21.62606 |
updated_at | 2019-11-16 10:43:14.982354 |
description | An easy to use tool to generate fake data in bulk and export it as Avro, Parquet or directly into your database as tables |
homepage | |
repository | https://github.com/arunma/datagen |
max_upload_size | |
id | 139106 |
size | 77,428 |
An easy to use tool to generate fake/dummy data in bulk and export it as Avro, CSV, Json or directly into your database as tables (coming soon!).
DataGen is a command line application written in Rust that generates dummy data for provides a means of interacting with the social Web from your personal desktop.
one_of
to generate random values from a listmin
and max
for numeric and date fieldsmean
and std
for numeric fieldsAt the moment, the installation is done only through Cargo. Please install Cargo by following the instructions from https://www.rust-lang.org/tools/install.
Once cargo is installed, you could pull the binary from crates.io using :
cargo install datagen
Note: The binary would have been placed in your
<HOME_DIR>/.cargo/bin/
which the Cargo installation would have placed in your PATH. If not, please add it to your PATH.
datagen csv "<output_dir>/output.csv" "<schema_yaml_dir>/schema.yaml" 100 "^"
datagen avro "<output_dir>/output.avro" "<schema_yaml_dir>/schema_simple.yaml" 100
datagen json "<output_dir>/output.json" "<schema_yaml_dir>/all_examples.yaml" 100
---
name: person_schema
dataset:
name: person_table
columns:
- {name: id, not_null: false, dtype: int}
- {name: name, dtype: name}
- {name: age, dtype: age}
- {name: adult, default: 'false', dtype: boolean}
- {name: gender, dtype: string, one_of: ["M", "F"]}
- {name: dob, dtype: "date", min: "01/01/1950" , max: "03/01/2014", format: "%d/%m/%Y"}
- {name: event_date, dtype: "datetime", min: "2014-11-28 12:00:09" , max: "2014-11-30 12:00:09", format: "%Y-%m-%d %H:%M:%S"}
- {name: score, dtype: "int", mean: 1.00, std: 0.36}
- {name: distance, dtype: "int", min: 19000, max: 221377}
- {name: weight, dtype: "float", min: 1.00, max: 500.00}
Date format specifiers could be sourced from : https://docs.rs/chrono/0.4.9/chrono/format/strftime/index.html#specifiers
An example for the schema YAML is located at <PROJECT_ROOT>/test_data/schema_options.yaml
cargo build
cargo test -- --color always --nocapture
cargo run -- "csv" "<output_dir>/output.csv" "<schema_yaml_dir>/schema.yaml" 100 ";"
cargo run -- "avro" "<output_dir>/output.avro" "<schema_yaml_dir>/schema_simple.yaml" 100
cargo run -- "json" "<output_dir>/output.json" "<schema_yaml_dir>/schema.yaml" 100
0.1.0
0.1.1
0.1.3
0.1.4
Supports one_of eg.
- {name: "day_of_week", dtype: "string", one_of:["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]}
Support for min and max for numeric columns
- {name: "age", dtype: "int", min: 1 , max: 130}
Support for Date and Datetime (along with min and max)
- {name: "event_time", dtype: "datetime", min: "2014-11-28 12:00:09" , max: "2014-11-30 12:00:09", format: "%Y-%m-%d %H:%M:%S"}
- {name: "dob", dtype: "date", min: "01/01/1920" , max: "03/01/2019", format: "%d/%m/%Y"}
Support for semantic types (name, date, latitude, phone etc)
Arun Manivannan – @arunma – arun@arunma.com
Distributed under the MIT license. See LICENSE
for more information.
https://github.com/arunma/datagen
You want to help out? Awesome!