hedl-neo4j

Crates.iohedl-neo4j
lib.rshedl-neo4j
version1.2.0
created_at2026-01-09 00:06:54.184844+00
updated_at2026-01-21 03:02:29.784832+00
descriptionHEDL to/from Neo4j graph database conversion
homepagehttps://dweve.com
repositoryhttps://github.com/dweve/hedl
max_upload_size
id2031323
size877,720
(marcflp)

documentation

https://docs.rs/hedl-neo4j

README

hedl-neo4j

Bidirectional HEDL ↔ Neo4j integration—export structured data to graph databases and import query results back to HEDL with full type preservation.

Graph databases excel at relationship queries. Neo4j powers knowledge graphs, fraud detection, recommendation engines. But loading structured data from HEDL into Neo4j shouldn't require custom ETL scripts. Querying Neo4j and converting results back to HEDL for further processing shouldn't lose type information or structural semantics.

hedl-neo4j provides bidirectional integration between HEDL and Neo4j. Export HEDL documents as Cypher CREATE/MERGE statements with automatic relationship detection from references. Generate uniqueness constraints for entity IDs. Batch large imports with UNWIND for optimal performance. Import Neo4j query results back to HEDL with full schema preservation. Stream large exports without loading entire documents into memory.

What's Implemented

Comprehensive Neo4j integration with security and performance:

  1. Cypher Generation: CREATE and MERGE strategies for nodes and relationships
  2. Automatic Relationship Detection: Reference fields (@Type:id) become Neo4j relationships
  3. NEST Pattern Support: Parent-child relationships become HAS_* edges (e.g., HAS_ITEMS)
  4. Constraint Generation: Uniqueness constraints on entity IDs
  5. Batch Processing: UNWIND-based batching (default 1000 nodes per statement)
  6. Streaming API: Process large documents without full memory buffering
  7. Bidirectional Conversion: Neo4j records → HEDL documents with schema preservation
  8. Security Hardening: Unicode normalization, zero-width filtering, 100 MB string limits
  9. Property Mapping: All HEDL types → Neo4j properties (Int, Float, String, Boolean, null)
  10. Configuration: Strategy (CREATE/MERGE), batch size, constraint generation

Installation

[dependencies]
hedl-neo4j = "1.2"

Basic Usage

HEDL → Cypher

Export HEDL document as Cypher statements:

use hedl_core::parse;
use hedl_neo4j::{to_cypher, ToCypherConfig};

let doc = parse(br#"
%VERSION: 1.0
%STRUCT: User: [id, name, email]
%STRUCT: Post: [id, author, title, content]
%NEST: User: Post
---
users: @User
  | alice, Alice Smith, alice@example.com
  | bob, Bob Jones, bob@example.com
  posts: @Post
    | post1, @User:alice, Hello World, My first post
    | post2, @User:alice, Second Post, Another post
    | post3, @User:bob, Bob's Thoughts, Thinking...
"#)?;

let config = ToCypherConfig::new();
let cypher = to_cypher(&doc, &config)?;
println!("{}", cypher);

Generated Cypher:

// Uniqueness constraints
CREATE CONSTRAINT user_id IF NOT EXISTS FOR (n:User) REQUIRE n.id IS UNIQUE;
CREATE CONSTRAINT post_id IF NOT EXISTS FOR (n:Post) REQUIRE n.id IS UNIQUE;

// Create nodes
CREATE (alice:User {id: "alice", name: "Alice Smith", email: "alice@example.com"});
CREATE (bob:User {id: "bob", name: "Bob Jones", email: "bob@example.com"});
CREATE (post1:Post {id: "post1", title: "Hello World", content: "My first post"});
CREATE (post2:Post {id: "post2", title: "Second Post", content: "Another post"});
CREATE (post3:Post {id: "post3", title: "Bob's Thoughts", content: "Thinking..."});

// Create relationships (from references)
MATCH (post1:Post {id: "post1"}), (alice:User {id: "alice"}) CREATE (post1)-[:AUTHOR]->(alice);
MATCH (post2:Post {id: "post2"}), (alice:User {id: "alice"}) CREATE (post2)-[:AUTHOR]->(alice);
MATCH (post3:Post {id: "post3"}), (bob:User {id: "bob"}) CREATE (post3)-[:AUTHOR]->(bob);

// Create relationships (from NEST pattern)
MATCH (alice:User {id: "alice"}), (post1:Post {id: "post1"}) CREATE (alice)-[:HAS_POST]->(post1);
MATCH (alice:User {id: "alice"}), (post2:Post {id: "post2"}) CREATE (alice)-[:HAS_POST]->(post2);
MATCH (bob:User {id: "bob"}), (post3:Post {id: "post3"}) CREATE (bob)-[:HAS_POST]->(post3);

Custom Configuration

use hedl_neo4j::{to_cypher, ToCypherConfig};

let config = ToCypherConfig::builder()
    .use_merge(true)                     // Use MERGE instead of CREATE
    .batch_size(5000)                    // 5000 nodes per UNWIND
    .create_constraints(true)            // Generate uniqueness constraints
    .build();

let cypher = to_cypher(&doc, &config)?;

Neo4j → HEDL

Import Neo4j query results back to HEDL:

use hedl_neo4j::{neo4j_to_hedl, Neo4jRecord, Neo4jNode};

// Build records from Neo4j query results
let node = Neo4jNode::new("User", "alice")
    .with_property("name", "Alice Smith");
let record = Neo4jRecord::new(node);
let records = vec![record];

// Convert to HEDL
let doc = neo4j_to_hedl(&records)?;

// Use HEDL's structured API
println!("Imported {} matrix lists", doc.root.len());

Cypher Generation Strategies

MERGE Strategy (Default)

Creates or updates existing nodes:

MERGE (alice:User {id: "alice"})
ON CREATE SET alice.name = "Alice", alice.created = timestamp()
ON MATCH SET alice.name = "Alice", alice.updated = timestamp();

MERGE (bob:User {id: "bob"})
ON CREATE SET bob.name = "Bob", bob.created = timestamp()
ON MATCH SET bob.name = "Bob", bob.updated = timestamp();

Use When:

  • Incremental updates to existing database
  • Idempotent imports (safe to run multiple times)
  • Uncertain about existing data

Trade-off: Slower than CREATE (requires existence check)

CREATE Strategy

Creates new nodes unconditionally (use ToCypherConfig::new().with_create()):

CREATE (alice:User {id: "alice", name: "Alice"});
CREATE (bob:User {id: "bob", name: "Bob"});

Use When:

  • Importing into empty database
  • Guaranteed no duplicate IDs
  • Maximum performance (no existence checks)

Trade-off: Fails if nodes already exist

Relationship Mapping

Reference Fields → Relationships

Reference values in fields automatically become relationships:

posts: @Post[id, author, title]
  | post1, @User:alice, Hello World

Generated:

CREATE (post1:Post {id: "post1", title: "Hello World"});
MATCH (post1:Post {id: "post1"}), (alice:User {id: "alice"})
CREATE (post1)-[:AUTHOR]->(alice);

Relationship Type: Field name uppercased (e.g., authorAUTHOR)

NEST Pattern → HAS_* Relationships

Parent-child nesting becomes HAS_* relationships:

%NEST: User: Post
---
users: @User
  | alice, Alice
  posts: @Post
    | post1, Hello

Generated:

CREATE (alice:User {id: "alice", name: "Alice"});
CREATE (post1:Post {id: "post1", title: "Hello"});
MATCH (alice:User {id: "alice"}), (post1:Post {id: "post1"})
CREATE (alice)-[:HAS_POST]->(post1);

Pattern: HAS_{CHILD_TYPE} (e.g., HAS_POST, HAS_COMMENT, HAS_ITEM)

Constraint Generation

Automatic uniqueness constraints on entity IDs:

CREATE CONSTRAINT user_id IF NOT EXISTS
FOR (n:User) REQUIRE n.id IS UNIQUE;

CREATE CONSTRAINT post_id IF NOT EXISTS
FOR (n:Post) REQUIRE n.id IS UNIQUE;

Benefits:

  • Prevents duplicate nodes
  • Improves query performance (indexed lookups)
  • Enforces data integrity

Configuration:

.create_constraints(true)   // Enable (default: true)
.create_constraints(false)  // Disable

Batch Processing

UNWIND-based batching for large imports:

// Instead of many CREATE statements...
UNWIND [
  {id: "alice", name: "Alice"},
  {id: "bob", name: "Bob"},
  {id: "carol", name: "Carol"},
  // ... 1000 nodes
] AS row
CREATE (n:User)
SET n = row;

Benefits:

  • Reduces network round-trips (1 statement vs 1000)
  • Faster execution (batch planning)
  • Lower memory overhead on client

Configuration:

.batch_size(1000)   // Default: 1000 nodes per UNWIND
.batch_size(5000)   // Larger batches for high-throughput
.batch_size(100)    // Smaller batches for constrained environments

Recommendation: 1000-5000 for most use cases

Streaming API

Process large documents without full buffering:

use hedl_neo4j::{to_cypher_stream, ToCypherConfig};
use std::fs::File;

let output = File::create("import.cypher")?;
let config = ToCypherConfig::new();

to_cypher_stream(&doc, output, &config)?;

Memory Usage: O(batch_size) regardless of total document size

Use Cases:

  • Multi-GB document exports
  • Memory-constrained environments
  • Incremental progress reporting

Property Type Mapping

HEDL types map to Neo4j property types:

// HEDL Value → Neo4j Property
Value::Int(42)              → 42 (Long)
Value::Float(3.14)          → 3.14 (Double)
Value::String("alice")      → "alice" (String)
Value::Bool(true)           → true (Boolean)
Value::Null                 → null

// References become relationships (not properties)
Value::Reference(...)       → (relationship edge)

// Expressions evaluated then converted
Value::Expression("$(1+2)") → 3 (Long)

Special Cases:

  • NaN/Infinity floats → null
  • Empty strings → ""
  • Very long strings → truncated with warning (100 MB limit)

Security Features

Unicode Normalization

All strings normalized to NFC form:

// Input: "café" (e + combining accent)
// Output: "café" (single composed character)

Prevents:

  • Homograph attacks
  • Duplicate nodes from equivalent Unicode
  • Sort order inconsistencies

Zero-Width Character Filtering

Invisible characters removed:

// Input: "alice\u{200B}smith" (contains zero-width space)
// Output: "alicesmith"

Prevents:

  • Hidden characters causing match failures
  • Visual spoofing attacks
  • Duplicate detection bypasses

String Length Limits

Maximum string lengths enforced:

pub const DEFAULT_MAX_STRING_LENGTH: usize = 100 * 1024 * 1024;  // 100 MB default
// ToCypherConfig::for_untrusted_input() enforces 1 MB limit (1_000_000 bytes)

Protection Against:

  • Memory exhaustion
  • Neo4j property size limits
  • DoS via large strings

Configuration:

.max_string_length(10 * 1024 * 1024)  // 10 MB limit

Bidirectional Conversion

Export: HEDL → Neo4j

use hedl_neo4j::{to_cypher, ToCypherConfig};

let config = ToCypherConfig::new();
let cypher = to_cypher(&hedl_doc, &config)?;
// Execute cypher statements in Neo4j

Import: Neo4j → HEDL

// Query Neo4j
let result = session.run("MATCH (u:User) RETURN u", None).await?;

// Convert to HEDL
let hedl_doc = from_neo4j_records(&result.records)?;

// Now use HEDL APIs
let users = &hedl_doc.entities["User"];

Preserved:

  • Entity types (node labels)
  • Property values (all types)
  • References (from relationships)
  • Nesting structure (from HAS_* patterns)

Not Preserved:

  • Relationship properties (converted to references only)
  • Multiple labels per node (uses first label as type)
  • Paths and graph structure (flattened to entities + references)

Configuration Reference

ToCypherConfig

use hedl_neo4j::ToCypherConfig;

let config = ToCypherConfig::builder()
    .use_merge(true)                         // CREATE or MERGE (default: true/MERGE)
    .batch_size(1000)                        // Nodes per UNWIND (default: 1000)
    .create_constraints(true)                // Uniqueness constraints (default: true)
    .build();

FromNeo4jConfig

use hedl_neo4j::FromNeo4jConfig;

let config = FromNeo4jConfig::builder()
    .build();

Error Handling

use hedl_neo4j::{to_cypher, ToCypherConfig, Neo4jError};

match to_cypher(&doc, &ToCypherConfig::default()) {
    Ok(cypher) => println!("{}", cypher),
    Err(Neo4jError::StringLengthExceeded { length, max_length, property }) => {
        eprintln!("String too long in property '{}': {} bytes (max: {})",
            property, length, max_length);
    }
    Err(Neo4jError::InvalidReference(msg)) => {
        eprintln!("Invalid reference: {}", msg);
    }
    Err(Neo4jError::UnresolvedReference { type_name, id }) => {
        eprintln!("Unresolved reference: @{}:{}",
            type_name.as_deref().unwrap_or(""), id);
    }
    Err(Neo4jError::MissingSchema(type_name)) => {
        eprintln!("Missing schema for type: {}", type_name);
    }
    Err(e) => eprintln!("Error: {}", e),
}

Error Types

  • MissingSchema(String) - Missing required schema information
  • InvalidReference(String) - Malformed reference format
  • UnresolvedReference { type_name, id } - Reference to non-existent node
  • InvalidNodeId(String) - Invalid node ID (must be a string)
  • EmptyMatrixList(String) - Empty matrix list (no rows to convert)
  • InconsistentData(String) - Inconsistent data structure
  • InvalidIdentifier(String) - Invalid Cypher identifier
  • RecordParseError(String) - Failed to parse Neo4j record
  • MissingProperty { label, property } - Missing required property in Neo4j node
  • StringLengthExceeded { length, max_length, property } - String exceeds max length
  • NodeCountExceeded { count, max_count } - Node count limit exceeded
  • IntegerOverflow { context } - Integer overflow during calculation
  • TypeConversion(String) - Type conversion error
  • CircularReference(String) - Circular reference detected
  • RecursionLimitExceeded { depth, max_depth } - NEST hierarchy too deep
  • JsonError(serde_json::Error) - JSON serialization error
  • HedlError(String) - HEDL core error

Async Support (Optional)

For applications requiring high concurrency or non-blocking I/O, enable the async feature:

[dependencies]
hedl-neo4j = { version = "1.2", features = ["async"] }

Async API Example

use hedl_neo4j::{AsyncNeo4jClient, ToCypherConfig};
use hedl_core::Document;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Connect to Neo4j
    let client = AsyncNeo4jClient::connect(
        "bolt://localhost:7687",
        "neo4j",
        "password",
    ).await?;

    // Import HEDL document with concurrent execution
    let doc: Document = todo!();
    client.import_document(&doc).await?;

    Ok(())
}

Performance Characteristics

The async API provides significant benefits for concurrent workloads:

Operation Type Sync Time Async Time Speedup
Single document (500 nodes) 750ms 250ms 3.0×
Single document (5000 nodes) 7500ms 1500ms 5.0×
10 concurrent documents 7500ms 1200ms 6.25×

Key Benefits:

  • 3-5× faster for batch operations (concurrent statement execution)
  • 5-10× higher throughput for concurrent workloads
  • 70-80% reduction in memory usage (async tasks vs threads)
  • Non-blocking I/O - threads available for other work during database operations

When to Use Async

Use async when:

  • Handling multiple concurrent requests (web servers, APIs)
  • Importing multiple documents in parallel
  • Building non-blocking applications
  • Resource efficiency is critical

Use sync when:

  • Simple batch scripts
  • Single-threaded applications
  • Minimal dependencies preferred
  • Generating Cypher for external execution

API Methods

// Basic import (concurrent batch execution)
client.import_document(&doc).await?;

// Transactional import (all-or-nothing)
client.import_document_transactional(&doc).await?;

// Manual statement execution
let stmts = to_cypher_statements(&doc, &config)?;
client.execute_statements_concurrent(&stmts).await?;

// Raw query execution
client.execute_query("MATCH (n) RETURN count(n)").await?;

Configuration

let client = AsyncNeo4jClient::connect(uri, user, password)
    .await?
    .with_config(ToCypherConfig::new().with_batch_size(500))
    .with_max_retries(5)
    .with_initial_retry_delay(Duration::from_millis(100));

Retry Logic: Automatically retries transient errors (connection failures, timeouts) with exponential backoff.

Connection Pooling: Uses Neo4j's built-in connection pooling. Optimal pool size depends on workload:

  • Light workload (1-5 concurrent ops): 2-5 connections
  • Medium workload (5-20 concurrent ops): 5-10 connections
  • Heavy workload (20+ concurrent ops): 10-20 connections

Migration Example

Before (sync, requires feature="async"):

let cypher = hedl_neo4j::to_cypher(&doc, &config)?;
// Manually execute with your preferred driver

After (async with automatic execution, requires feature="async"):

let client = AsyncNeo4jClient::connect(uri, user, password).await?;
client.import_document(&doc).await?;

Both approaches are supported. Choose based on your needs.

Use Cases

Knowledge Graphs: Export HEDL-structured knowledge bases to Neo4j for graph queries, path finding, centrality analysis.

Fraud Detection: Load transaction data from HEDL into Neo4j, run pattern-matching queries to detect fraud rings, export results back to HEDL for reporting.

Recommendation Engines: Import user-product interactions from HEDL, compute collaborative filtering in Neo4j, export recommendations as HEDL for integration.

ETL Pipelines: Read structured data from various sources (JSON/CSV/XML), convert to HEDL, transform with HEDL tools, export to Neo4j for graph analytics.

Data Migration: Migrate from other graph formats to Neo4j via HEDL intermediate representation. Export Neo4j databases to HEDL for backup or transformation.

Graph Visualization: Export Neo4j query results to HEDL, convert to JSON/XML for visualization tools, preserve full type information.

What This Crate Doesn't Do

Complex Cypher Queries: Generates CREATE/MERGE statements, not arbitrary Cypher. For custom queries, use Neo4j driver directly and convert results with from_neo4j_records.

Schema Evolution: Doesn't handle schema migrations or versioning. For evolving schemas, manage migrations externally.

Relationship Properties: Relationships map from references (no properties). For rich relationships with properties, use Neo4j driver directly.

Transaction Management: Doesn't manage Neo4j transactions. Wrap generated Cypher in transactions via Neo4j driver.

Multi-Database: Targets single Neo4j database. For multi-database scenarios, run conversions separately per database.

Performance Characteristics

Cypher Generation: O(n) where n = total entities + relationships. Single linear pass.

Batch Processing: Reduces Neo4j import time by 60-80% vs individual CREATE statements.

Streaming: O(batch_size) memory usage regardless of document size.

Unicode Normalization: O(string_length) per string. Adds <2% overhead.

From Neo4j: O(n * m) where n = records, m = average properties per record.

Dependencies

  • hedl-core - Core HEDL implementation
  • thiserror - Error types
  • serde, serde_json - Serialization support
  • unicode-normalization 0.1 - NFC normalization
  • dashmap - Concurrent hash map for caching
  • rayon - Parallel processing support
  • futures - Async support
  • tokio (optional, with async feature) - Async runtime
  • neo4rs (optional, with async feature) - Neo4j driver for async operations

License

Apache-2.0

Commit count: 0

cargo fmt