| Crates.io | verificar |
| lib.rs | verificar |
| version | 0.5.0 |
| created_at | 2025-11-25 17:04:02.111641+00 |
| updated_at | 2025-11-30 21:37:57.275885+00 |
| description | Synthetic Data Factory for Domain-Specific Code Intelligence |
| homepage | |
| repository | https://github.com/paiml/verificar |
| max_upload_size | |
| id | 1950090 |
| size | 1,477,234 |
Synthetic Data Factory for Domain-Specific Code Intelligence
Verificar is a unified combinatorial test generation and synthetic data factory for PAIML transpiler projects (depyler, bashrs, ruchy, decy). It generates verified (source, target, correctness) tuples at scale, creating training data for domain-specific code intelligence models.
┌─────────────────────────────────────────────────────────────┐
│ VERIFICAR CORE │
├─────────────────────────────────────────────────────────────┤
│ Grammar → Generator → Mutator → Oracle │
│ Definitions Engine Engine Verification │
└─────────────────────────────────────────────────────────────┘
Add to your Cargo.toml:
[dependencies]
verificar = "0.3"
Or with optional features:
[dependencies]
verificar = { version = "0.3", features = ["parquet", "ml"] }
use verificar::generator::{Generator, SamplingStrategy};
use verificar::Language;
// Create a generator for Python
let generator = Generator::new(Language::Python);
// Generate test cases using coverage-guided sampling
let strategy = SamplingStrategy::CoverageGuided {
coverage_map: None,
max_depth: 3,
seed: 42,
};
let test_cases = generator.generate(strategy, 100);
# Generate Python test programs
verificar generate --language python --count 1000 --output corpus.json
# Generate with specific sampling strategy
verificar generate --language bash --strategy swarm --count 500
# Generate depyler-specific patterns
verificar depyler --category file_io --count 100 --output depyler_tests/
| Language | Grammar | Description |
|---|---|---|
| Python | PythonGrammar |
Functions, control flow, type hints (depyler source) |
| Bash | BashGrammar |
Variables, pipes, conditionals (bashrs source) |
| C | CGrammar |
Functions, pointers, memory operations (decy source) |
| TypeScript | TypeScriptGrammar |
Interfaces, generics, type annotations (decy target) |
| Ruchy | RuchyGrammar |
Custom DSL programs |
| Rust | - | Common target language |
Based on organizational intelligence analysis of 1,296 defect-fix commits:
| Priority | Category | Allocation | Rationale |
|---|---|---|---|
| P0 | ASTTransform | 50% | Universal dominant defect (40-62%) |
| P1 | OwnershipBorrow | 20% | Rust-specific (15-20%) |
| P2 | StdlibMapping | 15% | API translation errors |
| P3 | Language-specific | 15% | bashrs security, decy memory, etc. |
| Feature | Description |
|---|---|
parquet |
Enable Parquet data output |
ml |
Enable ML pipeline (aprender integration) |
tree-sitter |
Use tree-sitter for grammar parsing |
pest |
Use pest for PEG grammars |
full |
Enable all features |
MIT License - see LICENSE for details.
Contributions welcome! Please read the CLAUDE.md for development guidelines.