bq-schema-gen

Crates.io	bq-schema-gen
lib.rs	bq-schema-gen
version	0.1.1
created_at	2026-01-16 22:44:34.19847+00
updated_at	2026-01-19 12:15:41.755818+00
description	Generate BigQuery schema from JSON or CSV data files
homepage	https://github.com/omribromberg/bigquery-schema-generator-rust
repository	https://github.com/omribromberg/bigquery-schema-generator-rust
max_upload_size
id	2049426
size	552,835

(OmriBromberg)

documentation

https://docs.rs/bq-schema-gen

README

bq-schema-gen

Generate BigQuery schemas from JSON or CSV data. Unlike BigQuery's built-in auto-detect which only examines the first 500 records, this tool processes all records to generate complete and accurate schemas.

Quick Start

# Install
cargo install bq-schema-gen

# Generate a schema
echo '{"name": "Alice", "age": 30}' | bq-schema-gen

Features

Schema Generation - Infer BigQuery schemas from JSON or CSV files
Schema Diff - Compare schemas and detect breaking changes
Data Validation - Validate data against existing schemas
Watch Mode - Auto-regenerate schemas when files change
Parallel Processing - Fast processing of large datasets
Multiple Output Formats - JSON, DDL, JSON Schema

Installation

From crates.io

cargo install bq-schema-gen

Using Homebrew

brew tap omribromberg/bigquery-schema-generator-rust https://github.com/omribromberg/bigquery-schema-generator-rust
brew install bq-schema-gen

From GitHub Releases

Download pre-built packages from GitHub Releases.

Each release includes:

Pre-compiled binary
Shell completions (bash, zsh, fish, PowerShell)
Man pages

# Example: Extract and install on macOS/Linux
tar -xzf bq-schema-gen-v0.1.0-x86_64-apple-darwin.tar.gz
sudo mv bq-schema-gen /usr/local/bin/
# Optionally install completions (e.g., for zsh)
sudo mv completions/_bq-schema-gen /usr/local/share/zsh/site-functions/

From Source

git clone https://github.com/omribromberg/bigquery-schema-generator-rust
cd bigquery-schema-generator-rust
cargo install --path .

Usage

Generate Schema

From stdin:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen

From a file:

bq-schema-gen data.json --output schema.json

Multiple files with glob patterns:

bq-schema-gen "data/*.json"

Output separate schemas per file:

bq-schema-gen "data/*.json" --per-file --output-dir schemas/

CSV input:

bq-schema-gen --input-format csv data.csv

Compare Schemas (diff)

Compare two schemas to identify changes:

bq-schema-gen diff old_schema.json new_schema.json

Example output:

Schema Diff Report
==================

Summary: 1 added, 1 removed, 1 modified (2 breaking)

Added Fields:
  + email (STRING, NULLABLE)

Removed Fields:
  - legacy_id (INTEGER, NULLABLE)  [BREAKING]

Modified Fields:
  ~ name: Mode changed: NULLABLE -> REQUIRED  [BREAKING]

Output formats: text (default), json, json-patch, sql

bq-schema-gen diff old.json new.json --format json-patch

Validate Data

Validate data against an existing schema:

bq-schema-gen data.json --existing-schema-path schema.json

Watch Mode

Auto-regenerate schemas when files change:

bq-schema-gen watch data.json --output schema.json

CLI Reference

Flag	Description
`--input-format <FORMAT>`	Input format: `json` (default) or `csv`
`--output-format <FORMAT>`	Output format: `json`, `ddl`, `debug-map`, or `json-schema`
`--table-name <NAME>`	Table name for DDL output
`-o, --output <FILE>`	Output file (stdout if not provided)
`-q, --quiet`	Suppress progress messages
`--per-file`	Output separate schema for each input file
`--output-dir <DIR>`	Output directory for per-file schemas
`--keep-nulls`	Include null values and empty containers in schema
`--quoted-values-are-strings`	Treat quoted values as strings
`--infer-mode`	Infer REQUIRED mode for CSV fields
`--sanitize-names`	Replace invalid characters in field names
`--preserve-input-sort-order`	Preserve field order from input
`--existing-schema-path <FILE>`	Merge with an existing schema
`--ignore-invalid-lines`	Skip unparseable lines

All flags support both kebab-case (--keep-nulls) and underscore (--keep_nulls) syntax.

Diff Options

Flag	Description
`--format <FORMAT>`	Output: `text`, `json`, `json-patch`, `sql`
`--color <WHEN>`	Color output: `auto`, `always`, `never`
`--strict`	Flag ALL changes as breaking
`-o, --output <FILE>`	Output file

Output Formats

JSON (default)

Standard BigQuery schema format:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen

[
  {"mode": "NULLABLE", "name": "age", "type": "INTEGER"},
  {"mode": "NULLABLE", "name": "name", "type": "STRING"}
]

DDL

BigQuery CREATE TABLE statement:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen --output-format ddl --table-name myproject.users

CREATE TABLE `myproject.users` (
  age INT64,
  name STRING
);

JSON Schema

JSON Schema draft-07 format:

echo '{"name": "Alice", "age": 30}' | bq-schema-gen --output-format json-schema

Type Inference

The tool automatically infers BigQuery types:

JSON Type	BigQuery Type
string	STRING, DATE, TIME, or TIMESTAMP (auto-detected)
number (integer)	INTEGER
number (float)	FLOAT
boolean	BOOLEAN
object	RECORD
array	REPEATED

Type Evolution

Types evolve as more data is processed:

INTEGER + FLOAT = FLOAT
DATE/TIME/TIMESTAMP combinations = STRING
Type widening is automatic (INTEGER -> FLOAT, anything -> STRING)

Shell Completions

Shell completions for bash, zsh, fish, and PowerShell are included in GitHub releases and automatically installed via Homebrew.

Library Usage

The crate can be used as a Rust library:

use bq_schema_gen::{SchemaGenerator, GeneratorConfig, SchemaMap};
use serde_json::json;

let config = GeneratorConfig::default();
let mut generator = SchemaGenerator::new(config);
let mut schema_map = SchemaMap::new();

let record = json!({"name": "test", "count": 42});
generator.process_record(&record, &mut schema_map).unwrap();

let schema = generator.flatten_schema(&schema_map);

See docs.rs for the full API documentation.

License

Apache-2.0 (same as the original Python project)

Credits

Original Python implementation by Brian T. Park
Rust port maintains compatibility with the original tool's behavior

Commit count: 12

bq-schema-gen

documentation

README

bq-schema-gen

Quick Start

Features

Installation

From crates.io

Using Homebrew

From GitHub Releases

From Source

Usage

Generate Schema

Compare Schemas (diff)

Validate Data

Watch Mode

CLI Reference

Diff Options

Output Formats

JSON (default)

DDL

JSON Schema

Type Inference

Type Evolution

Shell Completions

Library Usage

License

Credits

cargo fmt