| Crates.io | helios-sof |
| lib.rs | helios-sof |
| version | 0.1.32 |
| created_at | 2025-08-22 16:20:28.121505+00 |
| updated_at | 2025-12-02 20:40:39.579941+00 |
| description | This crate provides a complete implementation of the SQL-on-FHIR specification for Rust, enabling the transformation of FHIR resources into tabular data using declarative ViewDefinitions. It supports all major FHIR versions (R4, R4B, R5, R6) through a version-agnostic abstraction layer. |
| homepage | https://github.com/HeliosSoftware/hfs/tree/main/crates/sof |
| repository | https://github.com/HeliosSoftware/hfs |
| max_upload_size | |
| id | 1806576 |
| size | 1,021,302 |
This crate provides a complete implementation of the SQL-on-FHIR specification for Rust, enabling the transformation of FHIR resources into tabular data using declarative ViewDefinitions. It supports all major FHIR versions (R4, R4B, R5, R6) through a version-agnostic abstraction layer.
The sof crate implements the HL7 FHIR SQL-on-FHIR Implementation Guide, providing:
Looking to use SQL-on-FHIR from Python? Check out the pysof package, which provides Python bindings for this crate:
import pysof
# Transform FHIR data to CSV, JSON, NDJSON, or Parquet
result = pysof.run_view_definition(view_definition, bundle, "csv")
Features:
pip install pysofSee the pysof README for installation and usage details.
This crate provides two executable targets:
sof-cli - Command Line InterfaceA full-featured command-line (CLI) tool for running ViewDefinition transformations. The CLI tool accepts FHIR ViewDefinition and Bundle resources as input (either from files or stdin) and applies the SQL-on-FHIR transformation to produce structured output in formats like CSV, JSON, or other supported content types.
# Basic CSV output (includes headers by default)
sof-cli --view patient-view.json --bundle patient-data.json --format csv
# CSV output without headers
sof-cli --view patient-view.json --bundle patient-data.json --format csv --no-headers
# JSON output to file
sof-cli -v observation-view.json -b lab-results.json -f json -o output.json
# Read ViewDefinition from stdin, Bundle from file
cat view-definition.json | sof-cli --bundle patient-data.json --format csv
# Read Bundle from stdin, ViewDefinition from file
cat patient-bundle.json | sof-cli --view view-definition.json --format json
# Load data using --source parameter (supports local paths and URLs)
sof-cli -v view-definition.json -s ./data/bundle.json -f csv
sof-cli -v view-definition.json -s /absolute/path/to/bundle.json -f csv
sof-cli -v view-definition.json -s file:///path/to/bundle.json -f csv
sof-cli -v view-definition.json -s https://example.com/fhir/bundle.json -f json
sof-cli -v view-definition.json -s s3://my-bucket/fhir-data/bundle.json -f csv
sof-cli -v view-definition.json -s gs://my-bucket/fhir-data/bundle.json -f json
sof-cli -v view-definition.json -s azure://my-container/fhir-data/bundle.json -f ndjson
# Filter resources modified after a specific date
sof-cli -v view-definition.json -b patient-data.json --since 2024-01-01T00:00:00Z -f csv
# Limit results to first 100 rows
sof-cli -v view-definition.json -b patient-data.json --limit 100
# Combine filters: recent resources limited to 50 results
sof-cli -v view-definition.json -b patient-data.json --since 2024-01-01T00:00:00Z --limit 50
# Load NDJSON file (newline-delimited JSON) - automatically detected by .ndjson extension
sof-cli -v view-definition.json -b patient-data.ndjson -f csv
sof-cli -v view-definition.json -s file:///path/to/data.ndjson -f json
sof-cli -v view-definition.json -s s3://my-bucket/fhir-data/patients.ndjson -f csv
# NDJSON content detection (works even without .ndjson extension)
sof-cli -v view-definition.json -b patient-data.txt -f csv # Auto-detects NDJSON content
# Streaming mode for large NDJSON files (memory-efficient chunked processing)
sof-cli -v view-definition.json -b large-patients.ndjson -f csv --chunk-size 500
sof-cli -v view-definition.json -b data.ndjson -f ndjson --skip-invalid
-v) or stdin-b), stdin, or external sources (-s)-s/--source to load from local paths (relative or absolute) or URLs: file://, http(s)://, s3://, gs://, azure://-o--since (RFC3339 format)--limit (1-10000)--bundle with .ndjson files--chunk-size (default: 1000 resources)--skip-invalid for fault-tolerant processing-v, --view <VIEW> Path to ViewDefinition JSON file (or use stdin if not provided)
-b, --bundle <BUNDLE> Path to FHIR Bundle JSON file (or use stdin if not provided)
-s, --source <SOURCE> Path or URL to FHIR data source (see Data Sources below)
-f, --format <FORMAT> Output format (csv, json, ndjson, parquet) [default: csv]
--no-headers Exclude CSV headers (only for CSV format)
-o, --output <OUTPUT> Output file path (defaults to stdout)
--since <SINCE> Filter resources modified after this time (RFC3339 format)
--limit <LIMIT> Limit the number of results (1-10000)
--fhir-version <VERSION> FHIR version to use [default: R4]
--parquet-row-group-size <MB> Row group size for Parquet (64-1024MB) [default: 256]
--parquet-page-size <KB> Page size for Parquet (64-8192KB) [default: 1024]
--parquet-compression <ALG> Compression for Parquet [default: snappy]
Options: none, snappy, gzip, lz4, brotli, zstd
--max-file-size <MB> Maximum file size for Parquet output (10-10000MB) [default: 1000]
When exceeded, creates numbered files (e.g., output_001.parquet)
--chunk-size <N> Number of resources per chunk for streaming NDJSON [default: 1000]
--skip-invalid Skip invalid JSON lines in NDJSON files instead of failing
-h, --help Print help
* Additional FHIR versions (R4B, R5, R6) available when compiled with corresponding features
The CLI provides two ways to specify FHIR data:
-b/--bundle: Direct path to a local file (simple, no protocol prefix needed)-s/--source: URL-based loading with protocol support (more flexible)The --source parameter supports loading FHIR data from various sources:
# Using --bundle (simpler for local files)
sof-cli -v view.json -b /path/to/bundle.json
# Using --source with relative path
sof-cli -v view.json -s ./data/bundle.json
# Using --source with absolute path
sof-cli -v view.json -s /path/to/bundle.json
# Using --source with file:// protocol
sof-cli -v view.json -s file:///path/to/bundle.json
sof-cli -v view.json -s https://example.com/fhir/bundle.json
# Set AWS credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1
# Load from S3 bucket
sof-cli -v view.json -s s3://my-bucket/fhir-data/bundle.json -f csv
# Option 1: Service account credentials
export GOOGLE_SERVICE_ACCOUNT=/path/to/service-account.json
# Option 2: Application Default Credentials
gcloud auth application-default login
# Load from GCS bucket
sof-cli -v view.json -s gs://my-bucket/fhir-data/bundle.json -f json
# Option 1: Storage account credentials
export AZURE_STORAGE_ACCOUNT=myaccount
export AZURE_STORAGE_ACCESS_KEY=mykey
# Option 2: Azure managed identity (when running in Azure)
# No environment variables needed
# Load from Azure container
sof-cli -v view.json -s azure://my-container/fhir-data/bundle.json -f ndjson
The source can contain:
In addition to standard JSON, the CLI and server support NDJSON (newline-delimited JSON) as an input format. NDJSON files contain one FHIR resource per line, making them ideal for streaming large datasets.
Format Detection:
.ndjson extension are automatically parsed as NDJSONExample NDJSON file:
{"resourceType": "Patient", "id": "patient-1", "gender": "male"}
{"resourceType": "Patient", "id": "patient-2", "gender": "female"}
{"resourceType": "Observation", "id": "obs-1", "status": "final", "code": {"text": "Test"}}
Error Handling:
--skip-invalid in streaming mode)Streaming Mode (Memory-Efficient Processing):
When processing large NDJSON files with --bundle, the CLI automatically uses streaming mode for memory-efficient processing:
# Stream large NDJSON file with default chunk size (1000 resources)
sof-cli -v view.json -b large-patients.ndjson -f csv
# Custom chunk size for memory-constrained environments
sof-cli -v view.json -b patients.ndjson -f csv --chunk-size 100
# Skip invalid lines and continue processing
sof-cli -v view.json -b patients.ndjson -f ndjson --skip-invalid
# Output to file with streaming
sof-cli -v view.json -b huge-dataset.ndjson -f csv -o output.csv --chunk-size 500
Streaming mode features:
--chunk-size resources are loaded at a time (~10MB per 1000 resources)--skip-invalid to continue past malformed JSON linesUsage Examples:
# Load from local NDJSON file
sof-cli -v view.json -b patients.ndjson -f csv
# Load from cloud storage
sof-cli -v view.json -s s3://bucket/patients.ndjson -f json
# Mix NDJSON source with JSON bundle
sof-cli -v view.json -s file:///data.ndjson -b additional-data.json -f csv
# Server API with NDJSON
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?source=s3://bucket/data.ndjson" \
-H "Content-Type: application/json" \
-d '{"resourceType": "Parameters", "parameter": [{"name": "viewResource", "resource": {...}}]}'
The CLI supports multiple output formats via the -f/--format parameter:
csv (default) - Comma-separated values format
--no-headers flag to exclude column headersjson - JSON array format
ndjson - Newline-delimited JSON format
parquet - Apache Parquet columnar format
sof-server - HTTP ServerA high-performance HTTP server providing SQL-on-FHIR ViewDefinition transformation capabilities with advanced Parquet support and streaming for large datasets. Use this server if you need a stateless, simple web service for SQL-on-FHIR implementations. Should you need to perform SQL-on-FHIR transformations using server-stored ViewDefinitions and server-stored FHIR data, use the full capabilities of the Helios FHIR Server in hfs.
# Start server with defaults
sof-server
# Custom configuration via command line
sof-server --port 3000 --host 0.0.0.0 --log-level debug
# Custom configuration via environment variables
SOF_SERVER_PORT=3000 SOF_SERVER_HOST=0.0.0.0 sof-server
# Check server health
curl http://localhost:8080/health
# Get CapabilityStatement
curl http://localhost:8080/metadata
The server can be configured using either command-line arguments or environment variables. Command-line arguments take precedence when both are provided.
| Variable | Description | Default |
|---|---|---|
SOF_SERVER_PORT |
Server port | 8080 |
SOF_SERVER_HOST |
Server host address | 127.0.0.1 |
SOF_LOG_LEVEL |
Log level (error, warn, info, debug, trace) | info |
SOF_MAX_BODY_SIZE |
Maximum request body size in bytes | 10485760 (10MB) |
SOF_REQUEST_TIMEOUT |
Request timeout in seconds | 30 |
SOF_ENABLE_CORS |
Enable CORS (true/false) | true |
SOF_CORS_ORIGINS |
Allowed CORS origins (comma-separated, * for any) | * |
SOF_CORS_METHODS |
Allowed CORS methods (comma-separated, * for any) | GET,POST,PUT,DELETE,OPTIONS |
SOF_CORS_HEADERS |
Allowed CORS headers (comma-separated, * for any) | Common headers¹ |
| Argument | Short | Description | Default |
|---|---|---|---|
--port |
-p |
Server port | 8080 |
--host |
-H |
Server host address | 127.0.0.1 |
--log-level |
-l |
Log level | info |
--max-body-size |
-m |
Max request body (bytes) | 10485760 |
--request-timeout |
-t |
Request timeout (seconds) | 30 |
--enable-cors |
-c |
Enable CORS | true |
--cors-origins |
Allowed origins (comma-separated) | * |
|
--cors-methods |
Allowed methods (comma-separated) | GET,POST,PUT,DELETE,OPTIONS |
|
--cors-headers |
Allowed headers (comma-separated) | Common headers¹ |
# Production configuration with environment variables
export SOF_SERVER_HOST=0.0.0.0
export SOF_SERVER_PORT=8080
export SOF_LOG_LEVEL=warn
export SOF_MAX_BODY_SIZE=52428800 # 50MB
export SOF_REQUEST_TIMEOUT=60
export SOF_ENABLE_CORS=false
sof-server
# Development configuration
sof-server --log-level debug --enable-cors true
# CORS configuration for specific frontend
sof-server --cors-origins "http://localhost:3000,http://localhost:3001" \
--cors-methods "GET,POST,OPTIONS" \
--cors-headers "Content-Type,Authorization"
# Disable CORS for internal services
sof-server --enable-cors false
# Show all configuration options
sof-server --help
When using the source parameter with cloud storage URLs, ensure the appropriate credentials are configured:
AWS S3 (s3:// URLs):
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1
sof-server
Google Cloud Storage (gs:// URLs):
# Option 1: Service account
export GOOGLE_SERVICE_ACCOUNT=/path/to/service-account.json
sof-server
# Option 2: Application Default Credentials
gcloud auth application-default login
sof-server
Azure Blob Storage (azure:// URLs):
# Option 1: Storage account credentials
export AZURE_STORAGE_ACCOUNT=myaccount
export AZURE_STORAGE_ACCESS_KEY=mykey
sof-server
# Option 2: Use managed identity when running in Azure
sof-server
The server provides flexible CORS (Cross-Origin Resource Sharing) configuration to control which web applications can access the API:
Origins: Specify which domains can access the server
* to allow any origin (default)https://app1.com,https://app2.comMethods: Control which HTTP methods are allowed
GET,POST,PUT,DELETE,OPTIONS* to allow any methodGET,POST,OPTIONSHeaders: Specify which headers clients can send
* to allow any headerContent-Type,Authorization,X-Custom-HeaderImportant Security Notes:
*) for origins, credentials (cookies, auth headers) are automatically disabled for security* to prevent unauthorized access# Development (permissive, no credentials)
sof-server # Uses default wildcard origin
# Production CORS configuration (with credentials)
export SOF_CORS_ORIGINS="https://app.example.com"
export SOF_CORS_METHODS="GET,POST,OPTIONS"
export SOF_CORS_HEADERS="Content-Type,Authorization"
sof-server
¹ Default headers: Accept,Accept-Language,Content-Type,Content-Language,Authorization,X-Requested-With
Returns the server's CapabilityStatement describing supported operations:
curl http://localhost:8080/metadata
Execute a ViewDefinition transformation:
# JSON output (default)
curl -X POST http://localhost:8080/ViewDefinition/$viewdefinition-run \
-H "Content-Type: application/json" \
-d '{
"resourceType": "Parameters",
"parameter": [{
"name": "viewResource",
"resource": {
"resourceType": "ViewDefinition",
"status": "active",
"resource": "Patient",
"select": [{
"column": [{
"name": "id",
"path": "id"
}, {
"name": "gender",
"path": "gender"
}]
}]
}
}, {
"name": "patient",
"resource": {
"resourceType": "Patient",
"id": "example",
"gender": "male"
}
}]
}'
# CSV output (includes headers by default)
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=text/csv" \
-H "Content-Type: application/json" \
-d '{...}'
# CSV output without headers
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=text/csv&header=false" \
-H "Content-Type: application/json" \
-d '{...}'
# NDJSON output
curl -X POST http://localhost:8080/ViewDefinition/$viewdefinition-run \
-H "Content-Type: application/json" \
-H "Accept: application/ndjson" \
-d '{...}'
The $viewdefinition-run POST operation accepts parameters either as query parameters or in a FHIR Parameters resource.
Parameter table:
| Name | Type | Use | Scope | Min | Max | Documentation |
|---|---|---|---|---|---|---|
| _format | code | in | type, instance | 1 | 1 | Output format - application/json, application/ndjson, text/csv, application/parquet |
| header | boolean | in | type, instance | 0 | 1 | This parameter only applies to text/csv requests. true (default) - return headers in the response, false - do not return headers. |
| maxFileSize | integer | in | type, instance | 0 | 1 | Maximum Parquet file size in MB (10-10000). When exceeded, generates multiple files in a ZIP archive. |
| rowGroupSize | integer | in | type, instance | 0 | 1 | Parquet row group size in MB (64-1024, default: 256) |
| pageSize | integer | in | type, instance | 0 | 1 | Parquet page size in KB (64-8192, default: 1024) |
| compression | code | in | type, instance | 0 | 1 | Parquet compression: none, snappy (default), gzip, lz4, brotli, zstd |
| viewReference | Reference | in | type, instance | 0 | 1 | Reference to ViewDefinition to be used for data transformation. (not yet supported) |
| viewResource | ViewDefinition | in | type | 0 | 1 | ViewDefinition to be used for data transformation. |
| patient | Reference | in | type, instance | 0 | * | Filter resources by patient. |
| group | Reference | in | type, instance | 0 | * | Filter resources by group. (not yet supported) |
| source | string | in | type, instance | 0 | 1 | URL or path to FHIR data source. Supports file://, http(s)://, s3://, gs://, and azure:// protocols. |
| _limit | integer | in | type, instance | 0 | 1 | Limits the number of results. (1-10000) |
| _since | instant | in | type, instance | 0 | 1 | Return resources that have been modified after the supplied time. (RFC3339 format, validates format only) |
| resource | Resource | in | type, instance | 0 | * | Collection of FHIR resources to be transformed into a tabular projection. |
All parameters except viewReference, viewResource, patient, group, and resource can be provided as POST query parameters:
application/json - JSON array output (default)text/csv - CSV outputapplication/ndjson - Newline-delimited JSONapplication/parquet - Parquet filetrue - Include headers (default for CSV)false - Exclude headersFor POST requests, parameters can be provided in a FHIR Parameters resource:
When the same parameter is specified in multiple places, the precedence order is:
The server automatically sets appropriate response headers based on the output format and size:
Standard Response Headers:
Content-Type: Based on format parameter
application/json for JSON outputtext/csv for CSV outputapplication/ndjson for NDJSON outputapplication/parquet for single Parquet fileapplication/zip for multiple Parquet filesStreaming Response Headers (for large files):
Transfer-Encoding: chunked - Automatically set for files > 10MBContent-Disposition: attachment; filename="..." - Suggests filename for downloads
filename="data.parquet"filename="data.zip"Note: The Transfer-Encoding: chunked header is automatically managed by the server. Clients don't need to set any special headers to receive chunked responses - they will automatically receive data in chunks if the response is large.
# Limit results - first 50 records as CSV
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_limit=50&_format=text/csv" \
-H "Content-Type: application/json" \
-d '{...}'
# CSV without headers, limited to 20 results
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=text/csv&header=false&_limit=20" \
-H "Content-Type: application/json" \
-d '{...}'
# Using header parameter in request body (overrides query params)
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=text/csv" \
-H "Content-Type: application/json" \
-d '{
"resourceType": "Parameters",
"parameter": [{
"name": "header",
"valueBoolean": false
}, {
"name": "viewResource",
"resource": {...}
}]
}'
# Filter by modification time (requires resources with lastUpdated metadata)
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_since=2024-01-01T00:00:00Z" \
-H "Content-Type: application/json" \
-d '{...}'
# Load data from S3 bucket
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?source=s3://my-bucket/bundle.json" \
-H "Content-Type: application/json" \
-d '{
"resourceType": "Parameters",
"parameter": [{
"name": "viewResource",
"resource": {...}
}]
}'
# Load data from Azure with filtering
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?source=azure://container/data.json&_limit=100" \
-H "Content-Type: application/json" \
-d '{...}'
# Generate Parquet with custom compression and row group size
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=application/parquet&compression=zstd&rowGroupSize=512" \
-H "Content-Type: application/json" \
-d '{...}' \
--output result.parquet
# Generate large Parquet with file splitting (returns ZIP if multiple files)
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=application/parquet&maxFileSize=100" \
-H "Content-Type: application/json" \
-d '{...}' \
--output result.zip
# Using Parquet parameters in request body
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run" \
-H "Content-Type: application/json" \
-d '{
"resourceType": "Parameters",
"parameter": [{
"name": "_format",
"valueCode": "parquet"
}, {
"name": "maxFileSize",
"valueInteger": 500
}, {
"name": "compression",
"valueCode": "brotli"
}, {
"name": "viewResource",
"resource": {...}
}]
}' \
--output result.zip
Transform FHIR resources using declarative ViewDefinitions:
use helios_sof::{SofViewDefinition, SofBundle, ContentType, run_view_definition};
// Parse ViewDefinition and Bundle
let view_definition: fhir::r4::ViewDefinition = serde_json::from_str(view_json)?;
let bundle: fhir::r4::Bundle = serde_json::from_str(bundle_json)?;
// Wrap in version-agnostic containers
let sof_view = SofViewDefinition::R4(view_definition);
let sof_bundle = SofBundle::R4(bundle);
// Transform to CSV with headers
let csv_output = run_view_definition(
sof_view,
sof_bundle,
ContentType::CsvWithHeader
)?;
Seamlessly work with any supported FHIR version:
// Version-agnostic processing
match fhir_version {
FhirVersion::R4 => {
let view = SofViewDefinition::R4(parse_r4_viewdef(json)?);
let bundle = SofBundle::R4(parse_r4_bundle(json)?);
run_view_definition(view, bundle, format)?
},
FhirVersion::R5 => {
let view = SofViewDefinition::R5(parse_r5_viewdef(json)?);
let bundle = SofBundle::R5(parse_r5_bundle(json)?);
run_view_definition(view, bundle, format)?
},
// ... other versions
}
Process collections with automatic row generation:
{
"resourceType": "ViewDefinition",
"resource": "Patient",
"select": [{
"forEach": "name",
"column": [{
"name": "family_name",
"path": "family"
}, {
"name": "given_name",
"path": "given.first()"
}]
}]
}
Define reusable values for complex expressions:
{
"constant": [{
"name": "loinc_system",
"valueString": "http://loinc.org"
}],
"select": [{
"where": [{
"path": "code.coding.where(system = %loinc_system).exists()"
}],
"column": [{
"name": "loinc_code",
"path": "code.coding.where(system = %loinc_system).code"
}]
}]
}
Filter resources using FHIRPath expressions:
{
"where": [{
"path": "status = 'final'"
}, {
"path": "effective.exists()"
}, {
"path": "value.exists()"
}]
}
Combine multiple select statements:
{
"select": [{
"unionAll": [{
"column": [{"name": "type", "path": "'observation'"}]
}, {
"column": [{"name": "type", "path": "'condition'"}]
}]
}]
}
Multiple output formats for different integration needs:
use helios_sof::ContentType;
// CSV without headers
let csv = run_view_definition(view, bundle, ContentType::Csv)?;
// CSV with headers
let csv_headers = run_view_definition(view, bundle, ContentType::CsvWithHeader)?;
// Pretty-printed JSON array
let json = run_view_definition(view, bundle, ContentType::Json)?;
// Newline-delimited JSON (streaming friendly)
let ndjson = run_view_definition(view, bundle, ContentType::NdJson)?;
// Apache Parquet (columnar binary format)
let parquet = run_view_definition(view, bundle, ContentType::Parquet)?;
The SOF implementation supports Apache Parquet format for efficient columnar data storage and analytics:
boolean → BOOLEANstring/code/uri → UTF8integer → INT32decimal → FLOAT64dateTime/date → UTF8snappy (default): Fast compression with good ratiosgzip: Maximum compatibility, good compressionlz4: Fastest compression/decompressionzstd: Balanced speed and compression ratiobrotli: Best compression rationone: No compression for maximum speedExample usage:
# CLI export with default settings (256MB row groups, snappy compression)
sof-cli --view view.json --bundle data.json --format parquet -o output.parquet
# Optimize for smaller files with better compression
sof-cli --view view.json --bundle data.json --format parquet \
--parquet-compression zstd \
--parquet-row-group-size 128 \
-o output.parquet
# Maximize compression for archival
sof-cli --view view.json --bundle data.json --format parquet \
--parquet-compression brotli \
--parquet-row-group-size 512 \
--parquet-page-size 2048 \
-o output.parquet
# Fast processing with minimal compression
sof-cli --view view.json --bundle data.json --format parquet \
--parquet-compression lz4 \
--parquet-row-group-size 64 \
-o output.parquet
# Split large datasets into multiple files (500MB each)
sof-cli --view view.json --bundle large-data.json --format parquet \
--max-file-size 500 \
-o output.parquet
# Creates: output.parquet (first 500MB)
# output_002.parquet (next 500MB)
# output_003.parquet (remaining data)
# Server API - single Parquet file
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=application/parquet" \
-H "Content-Type: application/json" \
-d '{"resourceType": "Parameters", ...}' \
--output result.parquet
# Server API - with file splitting (returns ZIP archive if multiple files)
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=application/parquet&maxFileSize=100" \
-H "Content-Type: application/json" \
-d '{"resourceType": "Parameters", ...}' \
--output result.zip
# Server API - optimized settings for large datasets
curl -X POST "http://localhost:8080/ViewDefinition/$viewdefinition-run?_format=application/parquet&compression=zstd&rowGroupSize=512&maxFileSize=500" \
-H "Content-Type: application/json" \
-d '{"resourceType": "Parameters", ...}' \
--output result.zip
snappy or lz4 for real-time processingzstd for balanced storage and query performancebrotli or gzip for long-term storage where space is critical--max-file-size or maxFileSize is specified:
base.parquet, base_002.parquet, base_003.parquet, etc.Transfer-Encoding: chunked - Automatically set by the server for streaming responsesContent-Type: application/parquet or application/zip - Based on single or multi-file outputContent-Disposition: attachment; filename="data.parquet" or filename="data.zip" - For convenient file downloadsThe SQL-on-FHIR implementation leverages multi-core processors for optimal performance through parallel resource processing:
rayon for both batch and streaming modesRAYON_NUM_THREADS environment variableThe RAYON_NUM_THREADS environment variable controls the number of threads used for parallel processing:
# Use all available CPU cores (default behavior)
sof-cli --view view.json --bundle large-bundle.json
# Limit to 4 threads for resource-constrained environments
RAYON_NUM_THREADS=4 sof-cli --view view.json --bundle large-bundle.json
# Use single thread (disables parallelization)
RAYON_NUM_THREADS=1 sof-cli --view view.json --bundle data.ndjson
# Server with custom thread pool
RAYON_NUM_THREADS=8 sof-server
# Python (pysof) also respects this variable
RAYON_NUM_THREADS=4 python my_script.py
When to adjust thread count:
RAYON_NUM_THREADS=2-4): On shared systems, containers with CPU limits, or when running multiple instancesRAYON_NUM_THREADS=1): For debugging, profiling, or deterministic output orderingBatch Mode (Bundle processing):
| Bundle Size | Sequential Time | Parallel Time | Speedup |
|---|---|---|---|
| 10 patients | 22.7ms | 8.3ms | 2.7x |
| 50 patients | 113.8ms | 16.1ms | 7.1x |
| 100 patients | 229.4ms | 35.7ms | 6.4x |
| 500 patients | 1109ms | 152ms | 7.3x |
Streaming Mode (NDJSON processing):
| Dataset | Batch Mode | Streaming Mode | Memory Reduction |
|---|---|---|---|
| 10k Patients (32MB) | 2.66s, 1.6GB | 0.93s, 45MB | 35x less memory, 2.9x faster |
| 93k Encounters (136MB) | 3.97s, 3.9GB | 2.75s, 25MB | 155x less memory, 1.4x faster |
The parallel processing ensures:
The crate uses trait abstractions to provide uniform processing across FHIR versions:
// Core traits for version independence
pub trait ViewDefinitionTrait {
fn resource(&self) -> Option<&str>;
fn select(&self) -> Option<&[Self::Select]>;
fn where_clauses(&self) -> Option<&[Self::Where]>;
fn constants(&self) -> Option<&[Self::Constant]>;
}
pub trait BundleTrait {
type Resource: ResourceTrait;
fn entries(&self) -> Vec<&Self::Resource>;
}
Comprehensive error types for different failure scenarios:
use helios_sof::SofError;
match run_view_definition(view, bundle, format) {
Ok(output) => println!("Success: {} bytes", output.len()),
Err(SofError::InvalidViewDefinition(msg)) => {
eprintln!("ViewDefinition error: {}", msg);
},
Err(SofError::FhirPathError(msg)) => {
eprintln!("FHIRPath evaluation error: {}", msg);
},
Err(SofError::UnsupportedContentType(format)) => {
eprintln!("Unsupported format: {}", format);
},
Err(e) => eprintln!("Other error: {}", e),
}
Enable support for specific FHIR versions:
[dependencies]
sof = { version = "1.0", features = ["R4", "R5"] }
# Or enable all versions
sof = { version = "1.0", features = ["R4", "R4B", "R5", "R6"] }
Available features:
R4 - FHIR 4.0.1 support (default)R4B - FHIR 4.3.0 supportR5 - FHIR 5.0.0 supportR6 - FHIR 6.0.0 supportuse helios_sof::{SofViewDefinition, SofBundle, ContentType, run_view_definition};
use helios_std::fs;
fn process_directory(view_path: &str, data_dir: &str) -> Result<(), Box<dyn std::error::Error>> {
let view_def = fs::read_to_string(view_path)?;
let view: fhir::r4::ViewDefinition = serde_json::from_str(&view_def)?;
let sof_view = SofViewDefinition::R4(view);
for entry in fs::read_dir(data_dir)? {
let bundle_path = entry?.path();
let bundle_json = fs::read_to_string(&bundle_path)?;
let bundle: fhir::r4::Bundle = serde_json::from_str(&bundle_json)?;
let sof_bundle = SofBundle::R4(bundle);
let output = run_view_definition(
sof_view.clone(),
sof_bundle,
ContentType::CsvWithHeader
)?;
let output_path = bundle_path.with_extension("csv");
fs::write(output_path, output)?;
}
Ok(())
}
use helios_sof::{SofError, run_view_definition};
fn safe_transform(view: SofViewDefinition, bundle: SofBundle) -> Option<Vec<u8>> {
match run_view_definition(view, bundle, ContentType::Json) {
Ok(output) => Some(output),
Err(SofError::InvalidViewDefinition(msg)) => {
log::error!("ViewDefinition validation failed: {}", msg);
None
},
Err(SofError::FhirPathError(msg)) => {
log::warn!("FHIRPath evaluation issue: {}", msg);
None
},
Err(e) => {
log::error!("Unexpected error: {}", e);
None
}
}
}
The crate includes comprehensive tests covering:
Run tests with:
# All tests
cargo test
# Specific FHIR version
cargo test --features R5
# Integration tests only
cargo test --test integration