neotoma

Crates.ioneotoma
lib.rsneotoma
version0.1.1
created_at2025-08-05 15:59:10.076618+00
updated_at2025-08-05 16:31:20.286259+00
descriptionA flexible, cached parser combinator framework for Rust.
homepagehttps://git.sr.ht/~djarb/neotoma
repositoryhttps://git.sr.ht/~djarb/neotoma
max_upload_size
id1782169
size468,008
(djarb)

documentation

https://docs.rs/neotoma

README

Neotoma

A flexible, cached parser combinator framework for Rust with built-in memoization and backtracking capabilities.

Features

  • Parser Combinators: Compose complex parsers from simple building blocks
  • Built-in Caching: Automatic memoization prevents redundant parsing work
  • Backtracking Support: Efficient backtracking with automatic position management
  • UTF-8 Aware: Dual-level design supporting both byte-level and character-level parsing
  • Generic Input Sources: Works with any Read + Seek implementation
  • Thread Safe: Uses parking_lot for efficient synchronization primitives
  • Grammar Support: Self-parsing grammar system for complex syntax definitions

Quick Start

Add Neotoma to your Cargo.toml:

[dependencies]
neotoma = "0.1.0"

Basic Example

use neotoma::prelude::*;
use neotoma::{seq, oneof};
use std::io::Cursor;

// Parse "hello world" with optional whitespace
let parser = seq![
    Literal::from_str("hello"),
    Optional::new(Utf8Class::whitespace()),
    Literal::from_str("world")
];

let input = Cursor::new(b"hello world");
let mut source = Source::new(input);

match parse(parser, &mut source) {
    Ok(result) => println!("Parsed: {:?}", result),
    Err(e) => println!("Parse error: {:?}", e),
}

Character Classes

use neotoma::prelude::*;

// Parse one or more digits
let numbers = Utf8Class::with_min("0123456789", 1);

// Parse alphabetic characters
let letters = Utf8Class::alpha();

// Parse alphanumeric with bounds
let identifier = Utf8Class::alphanumeric().with_bounds(1, 20);

Composition with Macros

use neotoma::{seq, oneof};
use neotoma::prelude::*;

// Sequential composition
let sequence = seq![
    Literal::from_str("if"),
    Utf8Class::whitespace(),
    Literal::from_str("true")
];

// Alternative composition  
let choice = oneof![
    Literal::from_str("true"),
    Literal::from_str("false"),
    Utf8Class::digits()
];

Repetition and Lists

use neotoma::prelude::*;

// Zero or more digits
let numbers = Repeat::new(Utf8Class::digits());

// Comma-separated list
let csv = Repeat::new(Utf8Class::alpha())
    .with_joint(Literal::from_str(","));

// Bounded repetition
let bounded = Repeat::new(Utf8Class::alpha())
    .with_bounds(2, 5);

Grammar System

Neotoma includes a self-parsing grammar system that allows you to define complex parsers using a declarative syntax. The GrammarParser can parse grammar definitions and produce executable parsers.

Basic Usage

use neotoma::grammar::GrammarParser;
use neotoma::prelude::*;
use std::io::Cursor;

let grammar_text = r#"
    @start expression
    expression = (number "+" number)
    number = digits
"#;

let grammar_parser = GrammarParser::new();
let mut input = Cursor::new(grammar_text.as_bytes());
let mut source = Source::new(&mut input);

let grammar = parse(grammar_parser, &mut source).unwrap();

// Use the grammar to parse expressions
let mut expr_input = Cursor::new(b"42+24");
let mut expr_source = Source::new(&mut expr_input);
let result = parse(grammar, &mut expr_source).unwrap();

Grammar Syntax

Terminals and Literals

  • "text" - matches literal string (use \" for escaped quotes)
  • eof - matches end of file (ensures complete input consumption)

Character Classes

  • digits - matches one or more decimal digits (0-9)
  • alpha - matches one or more alphabetic characters (ASCII)
  • alphanumeric - matches one or more alphanumeric characters (ASCII)
  • whitespace - matches one or more whitespace characters (ASCII)
  • hexdigits - matches one or more hexadecimal digits (0-9, a-f, A-F)
  • udigits, ualpha, ualphanumeric, uwhitespace - UTF-8 equivalents
  • [abc] - matches any character in the set (custom UTF-8 character class)
  • [^abc] - matches any character NOT in the set (negated UTF-8 character class)

Composition

  • (A B C) - matches A followed by B followed by C (sequence)
  • (| A B C) - matches either A or B or C (alternatives)

Repetition

  • (* A) - matches zero or more instances of A
  • (+ A) - matches one or more instances of A
  • (? A) - matches zero or one instances of A
  • (* A / B) - matches zero or more A's separated by B
  • (+ A / B) - matches one or more A's separated by B

Rules and References

  • name = rule - defines a named parsing rule
  • name - references a named rule (enables recursion)
  • @start name - sets the starting rule (defaults to last rule if not specified)

Complex Grammar Example

let arithmetic_grammar = r#"
    @start complete_expression
    
    complete_expression = (expression eof)
    expression = additive_expr
    
    additive_expr = (| addition subtraction multiplicative_expr)
    addition = (multiplicative_expr (? whitespace) "+" (? whitespace) additive_expr)
    subtraction = (multiplicative_expr (? whitespace) "-" (? whitespace) additive_expr)
    
    multiplicative_expr = (| multiplication division primary_expr)
    multiplication = (primary_expr (? whitespace) "*" (? whitespace) multiplicative_expr)
    division = (primary_expr (? whitespace) "/" (? whitespace) multiplicative_expr)
    
    primary_expr = (| number variable parenthesized_expr)
    number = digits
    variable = alpha
    parenthesized_expr = ("(" (? whitespace) expression (? whitespace) ")")
"#;

This grammar handles:

  • Proper operator precedence (multiplication/division before addition/subtraction)
  • Recursive expressions with parentheses
  • Variables and numbers
  • Optional whitespace handling
  • Left-associative operations

Architecture

Neotoma uses a Template Method Pattern for its core Parser trait:

  • read(): Implement your parsing logic here
  • parse(): Public API that automatically handles caching and backtracking
  • id(): Override for parameterized parsers to avoid cache conflicts

Key Design Principles

  • Composability: Orthogonal combinators work together seamlessly
  • Performance: Memoization prevents redundant parsing
  • Type Safety: Generic composition prevents common mistakes
  • Memory Efficiency: Smart pointer integration and optimized data structures

Documentation

Examples

The tests/ directory contains comprehensive examples:

  • Arithmetic Parser: Mathematical expression parsing
  • Lisp Expressions: Recursive data structure parsing
  • Grammar Definitions: Self-parsing grammar systems

Python Bindings

Python bindings are available in the neotoma-py/ directory. See the Python README for details.

License

Licensed under either of:

at your option.

Contributing

Contributions are welcome! Please feel free to submit a patch.

Commit count: 0

cargo fmt