| Crates.io | bq-schema-gen |
| lib.rs | bq-schema-gen |
| version | 0.1.1 |
| created_at | 2026-01-16 22:44:34.19847+00 |
| updated_at | 2026-01-19 12:15:41.755818+00 |
| description | Generate BigQuery schema from JSON or CSV data files |
| homepage | https://github.com/omribromberg/bigquery-schema-generator-rust |
| repository | https://github.com/omribromberg/bigquery-schema-generator-rust |
| max_upload_size | |
| id | 2049426 |
| size | 552,835 |
Generate BigQuery schemas from JSON or CSV data. Unlike BigQuery's built-in auto-detect which only examines the first 500 records, this tool processes all records to generate complete and accurate schemas.
# Install
cargo install bq-schema-gen
# Generate a schema
echo '{"name": "Alice", "age": 30}' | bq-schema-gen
cargo install bq-schema-gen
brew tap omribromberg/bigquery-schema-generator-rust https://github.com/omribromberg/bigquery-schema-generator-rust
brew install bq-schema-gen
Download pre-built packages from GitHub Releases.
Each release includes:
# Example: Extract and install on macOS/Linux
tar -xzf bq-schema-gen-v0.1.0-x86_64-apple-darwin.tar.gz
sudo mv bq-schema-gen /usr/local/bin/
# Optionally install completions (e.g., for zsh)
sudo mv completions/_bq-schema-gen /usr/local/share/zsh/site-functions/
git clone https://github.com/omribromberg/bigquery-schema-generator-rust
cd bigquery-schema-generator-rust
cargo install --path .
From stdin:
echo '{"name": "Alice", "age": 30}' | bq-schema-gen
From a file:
bq-schema-gen data.json --output schema.json
Multiple files with glob patterns:
bq-schema-gen "data/*.json"
Output separate schemas per file:
bq-schema-gen "data/*.json" --per-file --output-dir schemas/
CSV input:
bq-schema-gen --input-format csv data.csv
Compare two schemas to identify changes:
bq-schema-gen diff old_schema.json new_schema.json
Example output:
Schema Diff Report
==================
Summary: 1 added, 1 removed, 1 modified (2 breaking)
Added Fields:
+ email (STRING, NULLABLE)
Removed Fields:
- legacy_id (INTEGER, NULLABLE) [BREAKING]
Modified Fields:
~ name: Mode changed: NULLABLE -> REQUIRED [BREAKING]
Output formats: text (default), json, json-patch, sql
bq-schema-gen diff old.json new.json --format json-patch
Validate data against an existing schema:
bq-schema-gen data.json --existing-schema-path schema.json
Auto-regenerate schemas when files change:
bq-schema-gen watch data.json --output schema.json
| Flag | Description |
|---|---|
--input-format <FORMAT> |
Input format: json (default) or csv |
--output-format <FORMAT> |
Output format: json, ddl, debug-map, or json-schema |
--table-name <NAME> |
Table name for DDL output |
-o, --output <FILE> |
Output file (stdout if not provided) |
-q, --quiet |
Suppress progress messages |
--per-file |
Output separate schema for each input file |
--output-dir <DIR> |
Output directory for per-file schemas |
--keep-nulls |
Include null values and empty containers in schema |
--quoted-values-are-strings |
Treat quoted values as strings |
--infer-mode |
Infer REQUIRED mode for CSV fields |
--sanitize-names |
Replace invalid characters in field names |
--preserve-input-sort-order |
Preserve field order from input |
--existing-schema-path <FILE> |
Merge with an existing schema |
--ignore-invalid-lines |
Skip unparseable lines |
All flags support both kebab-case (
--keep-nulls) and underscore (--keep_nulls) syntax.
| Flag | Description |
|---|---|
--format <FORMAT> |
Output: text, json, json-patch, sql |
--color <WHEN> |
Color output: auto, always, never |
--strict |
Flag ALL changes as breaking |
-o, --output <FILE> |
Output file |
Standard BigQuery schema format:
echo '{"name": "Alice", "age": 30}' | bq-schema-gen
[
{"mode": "NULLABLE", "name": "age", "type": "INTEGER"},
{"mode": "NULLABLE", "name": "name", "type": "STRING"}
]
BigQuery CREATE TABLE statement:
echo '{"name": "Alice", "age": 30}' | bq-schema-gen --output-format ddl --table-name myproject.users
CREATE TABLE `myproject.users` (
age INT64,
name STRING
);
JSON Schema draft-07 format:
echo '{"name": "Alice", "age": 30}' | bq-schema-gen --output-format json-schema
The tool automatically infers BigQuery types:
| JSON Type | BigQuery Type |
|---|---|
| string | STRING, DATE, TIME, or TIMESTAMP (auto-detected) |
| number (integer) | INTEGER |
| number (float) | FLOAT |
| boolean | BOOLEAN |
| object | RECORD |
| array | REPEATED |
Types evolve as more data is processed:
Shell completions for bash, zsh, fish, and PowerShell are included in GitHub releases and automatically installed via Homebrew.
The crate can be used as a Rust library:
use bq_schema_gen::{SchemaGenerator, GeneratorConfig, SchemaMap};
use serde_json::json;
let config = GeneratorConfig::default();
let mut generator = SchemaGenerator::new(config);
let mut schema_map = SchemaMap::new();
let record = json!({"name": "test", "count": 42});
generator.process_record(&record, &mut schema_map).unwrap();
let schema = generator.flatten_schema(&schema_map);
See docs.rs for the full API documentation.
Apache-2.0 (same as the original Python project)