cml-rs

Crates.iocml-rs
lib.rscml-rs
version0.3.0
created_at2025-12-22 21:44:02.933251+00
updated_at2026-01-08 05:40:17.100145+00
descriptionContent Markup Language (CML) v0.2 parser, generator, validator, and embedding store for structured documents
homepage
repositoryhttps://github.com/Blackfall-Labs/content-markup-language
max_upload_size
id2000313
size416,133
Magnus Trent (magnus-trent)

documentation

https://docs.rs/cml-rs

README

CML - Content Markup Language (Rust Implementation)

CML (Content Markup Language) is an XML-based markup language with profile-based extensibility for representing structured knowledge. This is the reference Rust implementation of CML v0.1, providing parsing, generation, and embedding storage for structured documents.

Overview

CML v0.1 provides:

  • Standardized structure - <cml>/<header>/<body>/<footer> for all documents
  • 📋 Profile system - Domain-specific vocabularies (code, legal, wiki, etc.)
  • 🗜️ Byte Punch compression - 40-70% size reduction with profile-aware dictionaries
  • 🔍 Semantic search - Vector keywords and full-text indexing
  • 📝 XSD schemas - Strict validation for all profiles
  • 🔄 Round-trip fidelity - Parse → Generate → Parse yields identical results

Quick Start

use cml::{CmlParser, CmlGenerator, CmlDocument, Profile, CodeBody};

// Parse a CML document
let xml = std::fs::read_to_string("example.cml")?;
let doc = CmlParser::parse_cml(&xml)?;

// Generate CML
let generator = CmlGenerator;
let output = generator.generate_cml(&doc)?;

// Create a new document
let doc = CmlDocument {
    version: "0.1".to_string(),
    encoding: "utf-8".to_string(),
    profile: Profile::Code,
    header: Header {
        title: "My API Docs".to_string(),
        // ...
    },
    body: Body::Code(CodeBody { /* ... */ }),
    footer: Footer::default(),
};

Profiles

code:api (v1.0 - Ratified)

For API documentation with semantic search.

Namespace: https://schemas.continuity.org/profiles/code/1.0

Elements:

  • <module> - Code modules/packages
  • <struct> - Data structures
  • <enum> - Enumerations
  • <trait> - Traits/interfaces
  • <function> - Free functions
  • <method> - Methods on types
  • <field> - Struct/enum fields

Example:

<cml version="0.1" encoding="utf-8" profile="code:api"
     xmlns="https://schemas.continuity.org/cml/0.1"
     xmlns:code="https://schemas.continuity.org/profiles/code/1.0">
  <header>
    <title>Rust Standard Library: Vec&lt;T&gt;</title>
    <identifier scheme="continuity">std.collections.vec</identifier>
  </header>
  <body>
    <code:struct id="std.vec.Vec" name="Vec" generic="T">
      <code:description vector="vector array dynamic">
        A contiguous growable array type.
      </code:description>
      <code:method id="std.vec.Vec.push" name="push">
        <code:signature>pub fn push(&amp;mut self, value: T)</code:signature>
        <code:description vector="append add">
          Appends an element to the back.
        </code:description>
        <code:complexity>amortized O(1)</code:complexity>
      </code:method>
    </code:struct>
  </body>
</cml>

See examples/cml/code-api-example.cml for full example.

legal:constitution (v1.0 - Ratified)

For constitutional and statutory documents.

Namespace: https://schemas.continuity.org/profiles/legal/1.0

Elements:

  • <preamble> - Document preamble
  • <article> - Top-level articles
  • <section> - Sections within articles
  • <clause> - Individual clauses
  • <paragraph> - Subdivisions
  • <amendment> - Amendments to the document

Example:

<cml version="0.1" encoding="utf-8" profile="legal:constitution"
     xmlns="https://schemas.continuity.org/cml/0.1"
     xmlns:legal="https://schemas.continuity.org/profiles/legal/1.0">
  <header>
    <title>Constitution of the United States</title>
    <identifier scheme="continuity">us.federal.constitution</identifier>
  </header>
  <body>
    <legal:preamble>
      We the People of the United States...
    </legal:preamble>
    <legal:article num="I" title="Legislative Branch" id="article-1">
      <legal:section num="1" id="article-1-section-1">
        <legal:clause num="1" id="article-1-section-1-clause-1">
          All legislative Powers herein granted...
        </legal:clause>
      </legal:section>
    </legal:article>
  </body>
</cml>

See examples/cml/legal-constitution-example.cml for full example.

bookstack:wiki (v0.1 - Local Namespace)

For knowledge base / wiki content.

Namespace: https://local.namespace/continuity/bookstack/0.1 (pending ratification)

Elements:

  • <book> - Top-level book
  • <chapter> - Chapters within books
  • <page> - Individual pages
  • <shelf> - Collections of books
  • <content> - Page content (markdown/html/plain)
  • <tags> - Metadata tags

Example:

<cml version="0.1" encoding="utf-8" profile="bookstack:wiki"
     xmlns="https://schemas.continuity.org/cml/0.1"
     xmlns:bookstack="https://local.namespace/continuity/bookstack/0.1">
  <header>
    <title>Engineering Documentation</title>
    <identifier scheme="continuity">company.engineering.rust-guide</identifier>
  </header>
  <body>
    <bookstack:book id="book-1" title="Rust Development Guide">
      <bookstack:chapter id="ch-1" title="Getting Started" num="1">
        <bookstack:page id="page-1" title="Setup">
          <bookstack:content format="markdown"><![CDATA[
# Development Environment Setup
...
          ]]></bookstack:content>
          <bookstack:tags>
            <tag name="rust"/>
            <tag name="setup"/>
          </bookstack:tags>
        </bookstack:page>
      </bookstack:chapter>
    </bookstack:book>
  </body>
</cml>

See examples/cml/bookstack-wiki-example.cml for full example.

CML Structure

Root Element

All CML documents start with:

<cml version="0.1" encoding="utf-8" profile="namespace:profile">

Attributes:

  • version - CML version (currently "0.1")
  • encoding - Character encoding (always "utf-8")
  • profile - Profile identifier (e.g., "code:api", "legal:constitution")

Header Section

Required metadata about the document:

<header>
  <title>Document Title</title>
  <author role="author">Name</author>
  <date type="created" when="2025-11-07"/>
  <identifier scheme="continuity">unique.document.id</identifier>
  <description>Optional summary</description>
  <meta name="key" value="value"/>
  <link rel="related" href="https://example.com"/>
</header>

Body Section

Profile-specific content. Structure depends on the profile.

Footer Section (Optional)

Signatures, provenance, and annotations:

<footer>
  <signatures>
    <signature>
      <signer>Alice</signer>
      <timestamp>2025-11-07T10:00:00Z</timestamp>
      <algorithm>ed25519</algorithm>
      <value>base64-encoded-sig</value>
    </signature>
  </signatures>
  <provenance>
    <change>
      <timestamp>2025-11-07T10:00:00Z</timestamp>
      <author>Bob</author>
      <description>Initial creation</description>
      <commit>abc123</commit>
    </change>
  </provenance>
  <annotations>
    <annotation author="Carol" target="element-id">
      Note about this element
    </annotation>
  </annotations>
</footer>

Inline Semantic Elements

Available in all profiles:

  • <em> - Emphasis
  • <strong> - Strong importance
  • <ref target="id" type="cross"> - Cross-reference
  • <term> - Defined term
  • <abbr> - Abbreviation
  • <date when="2025-11-07"> - Date/time reference
  • <currency code="USD" value="100.00"> - Currency amount
  • <snip reason="redacted"> - Elided content

Validation

XSD schemas are provided for strict validation:

use sam_cml::validate_document;

let doc = CmlParser::parse_cml(&xml)?;
validate_document(&doc)?; // Validates against schema

Schemas:

  • schemas/cml-core-0.1.xsd - Core CML structure
  • schemas/profiles/code-api-1.0.xsd - Code profile
  • schemas/profiles/legal-constitution-1.0.xsd - Legal profile
  • schemas/profiles/bookstack-wiki-0.1.xsd - Bookstack profile

Byte Punch Compression

CML integrates with Byte Punch for profile-aware compression:

use byte_punch::{Compressor, Dictionary};

// Load profile dictionary
let dict = Dictionary::from_file("dictionaries/code-api.json")?;
let compressor = Compressor::new(dict);

// Compress
let compressed = compressor.compress(&cml_xml)?;

// Decompress
let decompressed = compressor.decompress(&compressed)?;
assert_eq!(cml_xml, decompressed); // 100% fidelity

Compression Results:

  • Legal documents: ~65% compression
  • Code documentation: ~50-60% compression
  • Wiki content: ~55% compression

Testing

# Run all tests
cargo test -p sam-cml

# Run with output
cargo test -p sam-cml -- --nocapture

# Run specific test
cargo test -p sam-cml test_code_profile_roundtrip

Test Coverage:

  • 42/42 tests passing ✅
  • Unit tests for parser, generator, schema
  • Integration tests for round-trip fidelity
  • Profile-specific tests for each supported profile

Development

Project Structure

crates/sam-cml/
├── src/
│   ├── lib.rs          # Public API
│   ├── types.rs        # CML document types
│   ├── parser.rs       # XML → Rust parsing
│   ├── generator.rs    # Rust → XML generation
│   └── schema.rs       # Validation logic
├── tests/
│   ├── integration_test.rs      # Integration tests
│   ├── v01_tests.rs             # CML v0.1 tests
│   └── v01_roundtrip_tests.rs  # Round-trip tests
└── Cargo.toml

Adding a New Profile

  1. Define the profile in types.rs:
pub enum Profile {
    Code,
    Legal,
    Bookstack,
    MyProfile, // Add here
}

pub enum Body {
    Code(CodeBody),
    Legal(LegalBody),
    Bookstack(BookstackBody),
    MyProfile(MyProfileBody), // Add here
}

pub struct MyProfileBody {
    // Your profile structure
}
  1. Create XSD schema:

Create schemas/profiles/my-profile-1.0.xsd following the pattern of existing schemas.

  1. Add parser support:

Update parser.rs to handle your profile's elements.

  1. Add generator support:

Update generator.rs to output your profile's XML.

  1. Add tests:

Create tests in tests/ directory.

  1. Create dictionary:

Add crates/byte-punch/dictionaries/my-profile.json for compression.

  1. Create example:

Add examples/cml/my-profile-example.cml.

Migration from Legacy Format

Old <document> format is deprecated but still supported:

<!-- OLD (deprecated) -->
<document id="..." version="1.0">
  <metadata>
    <title>...</title>
  </metadata>
  <section>...</section>
</document>

<!-- NEW (CML v0.1) -->
<cml version="0.1" encoding="utf-8" profile="code:api">
  <header>
    <title>...</title>
    <identifier scheme="continuity">...</identifier>
  </header>
  <body>
    <!-- Profile-specific content -->
  </body>
</cml>

Parser auto-detects and upgrades legacy format internally.

Related Projects

  • byte-punch - Profile-aware compression (sister crate)
  • sam-engram - Engram packaging (coming soon)
  • rustdoc-to-cml - Generate CML from Rust docs

Documentation

License

MIT OR Apache-2.0

Commit count: 0

cargo fmt