| Crates.io | datasynth-fingerprint |
| lib.rs | datasynth-fingerprint |
| version | 0.2.1 |
| created_at | 2026-01-24 21:48:27.155203+00 |
| updated_at | 2026-01-24 21:48:27.155203+00 |
| description | Privacy-preserving synthetic data fingerprinting for DataSynth |
| homepage | https://github.com/ey-asu-rnd/SyntheticData |
| repository | https://github.com/ey-asu-rnd/SyntheticData |
| max_upload_size | |
| id | 2067572 |
| size | 438,795 |
Privacy-preserving synthetic data fingerprinting for DataSynth.
The datasynth-fingerprint crate provides functionality to:
.dsf (DataSynth Fingerprint) filesuse datasynth_fingerprint::{
extraction::{FingerprintExtractor, DataSource, CsvDataSource},
io::{FingerprintWriter, FingerprintReader},
synthesis::ConfigSynthesizer,
};
// Extract fingerprint from real data
let extractor = FingerprintExtractor::new();
let fingerprint = extractor.extract_from_csv("data.csv")?;
// Save to .dsf file
let writer = FingerprintWriter::new();
writer.write_to_file(&fingerprint, "fingerprint.dsf")?;
// Later: Load and synthesize config
let reader = FingerprintReader::new();
let fingerprint = reader.read_from_file("fingerprint.dsf")?;
let synthesizer = ConfigSynthesizer::new();
let config_patch = synthesizer.synthesize(&fingerprint)?;
The crate implements multiple privacy-preserving mechanisms:
use datasynth_fingerprint::extraction::{FingerprintExtractor, ExtractionConfig};
use datasynth_fingerprint::models::PrivacyLevel;
let config = ExtractionConfig::with_privacy_level(PrivacyLevel::High);
let extractor = FingerprintExtractor::with_config(config);
let fingerprint = extractor.extract_from_csv("data.csv")?;
let source = DataSource::Parquet(ParquetDataSource::new("data.parquet"));
let fingerprint = extractor.extract(&source)?;
// JSON array format
let source = DataSource::Json(JsonDataSource::json_array("data.json"));
// JSONL (newline-delimited) format
let source = DataSource::Json(JsonDataSource::jsonl("data.jsonl"));
// Extract from all supported files in a directory
let fingerprint = extractor.extract_from_directory("./data_folder/")?;
// Memory-efficient extraction for large CSV files
let fingerprint = extractor.extract_streaming_csv("large_data.csv")?;
A fingerprint contains:
| Component | Description |
|---|---|
manifest |
Metadata, version, checksums, privacy config |
schema |
Table structures, column types, relationships |
statistics |
Distributions, percentiles, Benford analysis |
correlations |
Correlation matrices, copulas (optional) |
integrity |
Unique constraints, foreign keys (optional) |
rules |
Business rules, balance equations (optional) |
anomalies |
Anomaly patterns and rates (optional) |
privacy_audit |
Privacy actions and epsilon tracking |
The .dsf format is a ZIP archive containing:
fingerprint.dsf
├── manifest.json # Version, checksums, privacy config
├── schema.yaml # Table and column definitions
├── statistics.yaml # Distribution parameters
├── correlations.yaml # Correlation matrices (optional)
├── integrity.yaml # Integrity constraints (optional)
├── rules.yaml # Business rules (optional)
├── anomalies.yaml # Anomaly profiles (optional)
└── privacy_audit.json # Privacy audit trail
DSF files can be signed for authenticity verification:
use datasynth_fingerprint::io::{SigningKey, DsfSigner, DsfVerifier};
// Generate a signing key
let key = SigningKey::generate("my-key-id");
// Sign when writing
let signer = DsfSigner::new(key.clone());
writer.write_to_file_signed(&fingerprint, "signed.dsf", &signer)?;
// Verify when reading
let verifier = DsfVerifier::new(key);
let fingerprint = reader.read_from_file_verified("signed.dsf", &verifier)?;
Convert fingerprints to generator configurations:
use datasynth_fingerprint::synthesis::{ConfigSynthesizer, SynthesisOptions};
let options = SynthesisOptions {
scale: 2.0, // Generate 2x the original row count
seed: Some(42), // Set random seed
preserve_correlations: true,
inject_anomalies: true,
};
let synthesizer = ConfigSynthesizer::with_options(options);
let result = synthesizer.synthesize_full(&fingerprint, seed)?;
// result.config_patch - configuration values to apply
// result.copula_generators - for preserving correlations
Evaluate how well synthetic data matches the original fingerprint:
use datasynth_fingerprint::evaluation::FidelityEvaluator;
let evaluator = FidelityEvaluator::new();
let report = evaluator.evaluate(&original_fingerprint, &synthetic_fingerprint)?;
println!("Overall fidelity: {:.2}", report.overall_score);
println!("Statistical fidelity: {:.2}", report.statistical_fidelity);
println!("Correlation fidelity: {:.2}", report.correlation_fidelity);
| Level | Epsilon | K | Use Case |
|---|---|---|---|
| Minimal | 5.0 | 3 | Low privacy requirements |
| Standard | 1.0 | 5 | Balanced (default) |
| High | 0.5 | 10 | Sensitive data |
| Maximum | 0.1 | 20 | Highly sensitive data |
Fingerprint - Root fingerprint structureSchemaFingerprint - Table and column schemasStatisticsFingerprint - Numeric and categorical statisticsCorrelationFingerprint - Correlation matrices and copulasPrivacyAudit - Privacy action trackingFingerprintExtractor - Main extraction coordinatorDataSource - Data source types (CSV, Parquet, JSON, Directory, Memory)ExtractionConfig - Extraction configurationStreamingNumericStats / StreamingCategoricalStats - Online statisticsFingerprintWriter - Write .dsf filesFingerprintReader - Read .dsf filesSigningKey / DsfSigner / DsfVerifier - Digital signaturesvalidate_dsf() - Validate .dsf file integrityConfigSynthesizer - Convert fingerprints to configsConfigPatch - Configuration patch valuesCopulaGenerator - Generate correlated samplesDistributionFitter - Fit distributions to dataFidelityEvaluator - Compare fingerprintsFidelityReport - Evaluation resultsThe fingerprint crate integrates with the datasynth-data CLI:
# Extract fingerprint from data
datasynth-data fingerprint extract \
--input ./real_data/ \
--output ./fingerprint.dsf \
--privacy-level standard
# Validate fingerprint file
datasynth-data fingerprint validate ./fingerprint.dsf
# Generate from fingerprint
datasynth-data generate \
--fingerprint ./fingerprint.dsf \
--output ./synthetic/ \
--scale 1.0
# Evaluate fidelity
datasynth-data fingerprint evaluate \
--fingerprint ./fingerprint.dsf \
--synthetic ./synthetic/
Same as the parent DataSynth project.