bq-schema-gen

Crates.iobq-schema-gen
lib.rsbq-schema-gen
version0.1.1
created_at2026-01-16 22:44:34.19847+00
updated_at2026-01-19 12:15:41.755818+00
descriptionGenerate BigQuery schema from JSON or CSV data files
homepagehttps://github.com/omribromberg/bigquery-schema-generator-rust
repositoryhttps://github.com/omribromberg/bigquery-schema-generator-rust
max_upload_size
id2049426
size552,835
(OmriBromberg)

documentation

https://docs.rs/bq-schema-gen

README

bq-schema-gen

Crates.io License CI docs.rs

Generate BigQuery schemas from JSON or CSV data. Unlike BigQuery's built-in auto-detect which only examines the first 500 records, this tool processes all records to generate complete and accurate schemas.

Quick Start

# Install
cargo install bq-schema-gen

# Generate a schema
echo '{"name": "Alice", "age": 30}' | bq-schema-gen

Features

  • Schema Generation - Infer BigQuery schemas from JSON or CSV files
  • Schema Diff - Compare schemas and detect breaking changes
  • Data Validation - Validate data against existing schemas
  • Watch Mode - Auto-regenerate schemas when files change
  • Parallel Processing - Fast processing of large datasets
  • Multiple Output Formats - JSON, DDL, JSON Schema

Installation

From crates.io

cargo install bq-schema-gen

Using Homebrew

brew tap omribromberg/bigquery-schema-generator-rust https://github.com/omribromberg/bigquery-schema-generator-rust
brew install bq-schema-gen

From GitHub Releases

Download pre-built packages from GitHub Releases.

Each release includes:

  • Pre-compiled binary
  • Shell completions (bash, zsh, fish, PowerShell)
  • Man pages
# Example: Extract and install on macOS/Linux
tar -xzf bq-schema-gen-v0.1.0-x86_64-apple-darwin.tar.gz
sudo mv bq-schema-gen /usr/local/bin/
# Optionally install completions (e.g., for zsh)
sudo mv completions/_bq-schema-gen /usr/local/share/zsh/site-functions/

From Source

git clone https://github.com/omribromberg/bigquery-schema-generator-rust
cd bigquery-schema-generator-rust
cargo install --path .

Usage

Generate Schema

From stdin:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen

From a file:

bq-schema-gen data.json --output schema.json

Multiple files with glob patterns:

bq-schema-gen "data/*.json"

Output separate schemas per file:

bq-schema-gen "data/*.json" --per-file --output-dir schemas/

CSV input:

bq-schema-gen --input-format csv data.csv

Compare Schemas (diff)

Compare two schemas to identify changes:

bq-schema-gen diff old_schema.json new_schema.json

Example output:

Schema Diff Report
==================

Summary: 1 added, 1 removed, 1 modified (2 breaking)

Added Fields:
  + email (STRING, NULLABLE)

Removed Fields:
  - legacy_id (INTEGER, NULLABLE)  [BREAKING]

Modified Fields:
  ~ name: Mode changed: NULLABLE -> REQUIRED  [BREAKING]

Output formats: text (default), json, json-patch, sql

bq-schema-gen diff old.json new.json --format json-patch

Validate Data

Validate data against an existing schema:

bq-schema-gen data.json --existing-schema-path schema.json

Watch Mode

Auto-regenerate schemas when files change:

bq-schema-gen watch data.json --output schema.json

CLI Reference

Flag Description
--input-format <FORMAT> Input format: json (default) or csv
--output-format <FORMAT> Output format: json, ddl, debug-map, or json-schema
--table-name <NAME> Table name for DDL output
-o, --output <FILE> Output file (stdout if not provided)
-q, --quiet Suppress progress messages
--per-file Output separate schema for each input file
--output-dir <DIR> Output directory for per-file schemas
--keep-nulls Include null values and empty containers in schema
--quoted-values-are-strings Treat quoted values as strings
--infer-mode Infer REQUIRED mode for CSV fields
--sanitize-names Replace invalid characters in field names
--preserve-input-sort-order Preserve field order from input
--existing-schema-path <FILE> Merge with an existing schema
--ignore-invalid-lines Skip unparseable lines

All flags support both kebab-case (--keep-nulls) and underscore (--keep_nulls) syntax.

Diff Options

Flag Description
--format <FORMAT> Output: text, json, json-patch, sql
--color <WHEN> Color output: auto, always, never
--strict Flag ALL changes as breaking
-o, --output <FILE> Output file

Output Formats

JSON (default)

Standard BigQuery schema format:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen
[
  {"mode": "NULLABLE", "name": "age", "type": "INTEGER"},
  {"mode": "NULLABLE", "name": "name", "type": "STRING"}
]

DDL

BigQuery CREATE TABLE statement:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen --output-format ddl --table-name myproject.users
CREATE TABLE `myproject.users` (
  age INT64,
  name STRING
);

JSON Schema

JSON Schema draft-07 format:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen --output-format json-schema

Type Inference

The tool automatically infers BigQuery types:

JSON Type BigQuery Type
string STRING, DATE, TIME, or TIMESTAMP (auto-detected)
number (integer) INTEGER
number (float) FLOAT
boolean BOOLEAN
object RECORD
array REPEATED

Type Evolution

Types evolve as more data is processed:

  • INTEGER + FLOAT = FLOAT
  • DATE/TIME/TIMESTAMP combinations = STRING
  • Type widening is automatic (INTEGER -> FLOAT, anything -> STRING)

Shell Completions

Shell completions for bash, zsh, fish, and PowerShell are included in GitHub releases and automatically installed via Homebrew.

Library Usage

The crate can be used as a Rust library:

use bq_schema_gen::{SchemaGenerator, GeneratorConfig, SchemaMap};
use serde_json::json;

let config = GeneratorConfig::default();
let mut generator = SchemaGenerator::new(config);
let mut schema_map = SchemaMap::new();

let record = json!({"name": "test", "count": 42});
generator.process_record(&record, &mut schema_map).unwrap();

let schema = generator.flatten_schema(&schema_map);

See docs.rs for the full API documentation.

License

Apache-2.0 (same as the original Python project)

Credits

  • Original Python implementation by Brian T. Park
  • Rust port maintains compatibility with the original tool's behavior
Commit count: 12

cargo fmt