icepick

Crates.ioicepick
lib.rsicepick
version0.4.1
created_at2025-11-17 04:20:25.466525+00
updated_at2026-01-19 20:23:55.530283+00
descriptionExperimental Rust client for Apache Iceberg with WASM support for AWS S3 Tables and Cloudflare R2
homepagehttps://github.com/smithclay/icepick
repositoryhttps://github.com/smithclay/icepick
max_upload_size
id1936233
size935,088
Clay Smith (smithclay)

documentation

https://docs.rs/icepick

README

icepick

Crates.io Documentation License Rust

A CLI tool and wasm-compatible library for managing Apache Iceberg tables in AWS S3 Tables and Cloudflare R2 Data Catalog.

Table of Contents

What it does

icepick provides a simple command-line interface and wasm-friendly library for working with Apache Iceberg tables:

  • List and inspect namespaces and tables
  • Scan tables with partition pruning and column statistics
  • Commit Parquet files to tables (with auto-detection of Hive-style partitions)
  • Compact small files using bin-pack compaction
  • Clean up snapshots based on retention policies

Why?

The official iceberg-rust library doesn't yet support WASM compilation, and most Iceberg tools are built for JVM environments. icepick fills the gap for:

  • Serverless environments like Cloudflare Workers
  • CLI-first workflows without spinning up Spark or Flink
  • Lightweight table maintenance (compaction, snapshot cleanup)
  • Quick data exploration without complex query engines

Quickstart

Install

cargo install icepick --features cli

Configure

Set your catalog credentials:

# For Cloudflare R2
export ICEPICK_CATALOG_URL="https://catalog.cloudflarestorage.com/<account-id>/<bucket>"
export ICEPICK_TOKEN="<cloudflare-api-token>"

# For AWS S3 Tables
export ICEPICK_CATALOG_ARN="arn:aws:s3tables:us-west-2:123456789012:bucket/my-bucket"
# Uses AWS credential chain (env vars, ~/.aws/credentials, IAM role)

Verify Connection

# List namespaces
icepick namespace list

# List tables in a namespace
icepick table list --namespace my_namespace

# Get table info
icepick table info my_namespace.my_table

CLI Reference

Namespaces

# List all namespaces
icepick namespace list

# Create a namespace
icepick namespace create my_namespace

# Delete a namespace
icepick namespace delete my_namespace

Tables

# List tables in a namespace
icepick table list --namespace my_namespace

# Get detailed table info (schema, partitioning, snapshots)
icepick table info my_namespace.my_table

# Scan table data (shows pruning stats with filters)
icepick table scan my_namespace.my_table

# Scan with filter
icepick table scan my_namespace.my_table --filter "date >= '2024-01-01'"

# Limit output rows
icepick table scan my_namespace.my_table --limit 100

Commit Files

Commit existing Parquet files to an Iceberg table:

# Preview what would be committed (dry run)
icepick commit /data/**/*.parquet --namespace prod --table events --dry-run

# Commit files to existing table
icepick commit /data/**/*.parquet --namespace prod --table events

# Create new table with partition spec
icepick commit /data/**/*.parquet --namespace prod --table events \
  --create --partition year:int,month:int

# For non-Hive paths, specify partition values explicitly
icepick commit /flat/*.parquet --namespace prod --table events \
  --partition-values year=2024,month=01

# Use specific file as schema exemplar
icepick commit /data/**/*.parquet --namespace prod --table events \
  --exemplar /data/sample.parquet --create

The commit command:

  • Uses first file's schema (or --exemplar) as the reference
  • Validates all files match the schema
  • Extracts partition values from Hive-style paths automatically
  • Supports --partition-values for flat directory structures
  • Shows detailed plan with --dry-run before committing

Compaction

Merge small files into larger ones for better query performance:

# Preview compaction plan (dry run)
icepick compact my_namespace.my_table --dry-run

# Execute compaction with default settings
icepick compact my_namespace.my_table

# Custom target file size (256 MB)
icepick compact my_namespace.my_table --target-size 268435456

# Only compact files smaller than 128 MB
icepick compact my_namespace.my_table --max-input-size 134217728

Snapshots

Manage table snapshots and clean up old versions:

# List all snapshots with age and status
icepick snapshot list my_namespace.my_table

# Preview cleanup (dry run)
icepick snapshot cleanup my_namespace.my_table --dry-run

# Execute cleanup with retention policy
icepick snapshot cleanup my_namespace.my_table \
  --older-than-days 7 \
  --retain-last 10

Snapshot cleanup respects:

  • Current snapshot - Never expired (it's the current table state)
  • Referenced snapshots - Never expired if referenced by branches or tags
  • Retention count - Keeps the N most recent regardless of age
  • Age threshold - Only expires snapshots older than the threshold

Cloudflare R2

Authentication

  1. Log into the Cloudflare dashboard
  2. Navigate to My ProfileAPI Tokens
  3. Create a token with R2 read/write permissions
  4. Set environment variables:
export ICEPICK_CATALOG_URL="https://catalog.cloudflarestorage.com/<account-id>/<bucket>"
export ICEPICK_TOKEN="<your-api-token>"

WASM Compatibility

The R2 catalog is fully WASM-compatible, making it suitable for:

  • Cloudflare Workers
  • Browser applications (if your catalog REST API supports CORS)

AWS S3 Tables

Authentication

Uses the AWS default credential provider chain:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  2. AWS credentials file (~/.aws/credentials)
  3. IAM instance profile (EC2)
  4. ECS task role
export ICEPICK_CATALOG_ARN="arn:aws:s3tables:us-west-2:123456789012:bucket/my-bucket"

Important: Ensure your credentials have S3 Tables permissions.

Platform Support

S3 Tables requires the AWS SDK and is only available on native platforms (Linux, macOS, Windows). It does not compile to WASM.

Library Usage

icepick can also be used as a Rust library for programmatic access to Iceberg tables. See DEVELOPER.md for:

  • Rust API examples
  • Direct Parquet writes
  • Registering existing files
  • WASM considerations

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Acknowledgments

Built on the official iceberg-rust library from the Apache Iceberg project.

Commit count: 65

cargo fmt