sipha

Crates.iosipha
lib.rssipha
version0.5.0
created_at2025-11-29 13:08:21.462148+00
updated_at2025-11-29 13:08:21.462148+00
descriptionCore parsing infrastructure for Sipha parser library
homepagehttps://github.com/nyaleph/sipha
repositoryhttps://github.com/nyaleph/sipha
max_upload_size
id1956670
size723,818
Nyaleph (NyalephTheCat)

documentation

https://docs.rs/sipha

README

Sipha

Crates.io docs.rs License: MIT

A flexible, incremental parsing library for Rust with support for multiple parsing algorithms. Sipha is designed from the ground up to support incremental parsingβ€”the ability to efficiently re-parse only the changed portions of your code, making it ideal for interactive applications like IDEs, editors, and language servers.

πŸ“š Read the Book - Comprehensive guide and documentation

Status: 0.5.0 Release

Sipha 0.5.0 provides a stable foundation for incremental parsing with multiple backends. The core API is stable, though some advanced features may continue to evolve. We welcome feedback and contributions!

Key Features

Incremental Parsing (Primary Focus)

Sipha's standout feature is its incremental parsing capability. When you edit code, Sipha can:

  • Reuse unchanged subtrees from previous parses
  • Reparse only affected regions instead of the entire file
  • Maintain parse caches for efficient updates
  • Dramatically improve performance in interactive editing scenarios

This makes Sipha perfect for building language servers, IDEs, and other tools that need to parse code as users type.

Additional Features

  • Multiple parsing backends: Choose from LL(k), LR, and more (via feature flags)
  • Immutable syntax trees: Green/red tree representation for efficient manipulation
  • Error recovery: Configurable error recovery strategies for robust parsing
  • Flexible grammar definition: Builder API for defining your grammar
  • Unicode support: Full Unicode support for identifiers and text (optional)
  • Rich diagnostics: Beautiful error messages with miette integration (optional)
  • Tree traversal: Visitor patterns and query APIs for working with syntax trees

Installation

Add Sipha to your Cargo.toml:

[dependencies]
sipha = "0.5.0"

Or with specific features:

[dependencies]
sipha = { version = "0.5.0", features = ["diagnostics", "unicode", "backend-ll"] }

Quick Start

Let's build a simple arithmetic expression parser step by step. This example will help you understand the core concepts.

Step 1: Define Your Syntax Kinds

First, define the tokens and non-terminals your parser will use. Sipha uses a unified SyntaxKind trait for both terminals (tokens) and non-terminals (grammar rules):

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    // Terminals (produced by lexer)
    Number,
    Plus,
    Minus,
    Multiply,
    Divide,
    LParen,
    RParen,
    Whitespace,
    Eof,
    // Non-terminals (produced by parser)
    Expr,
    Term,
    Factor,
}

impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor)
    }
    
    fn is_trivia(self) -> bool {
        matches!(self, ArithSyntaxKind::Whitespace)
    }
}

Step 2: Build a Lexer

Create a lexer to tokenize your input text:

use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1,
        max: None,
    })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number)
    .expect("Failed to build lexer");

Step 3: Tokenize Input

let input = "42 + 10";
let tokens = lexer.tokenize(input)
    .expect("Failed to tokenize input");

Step 4: Define Non-Terminals and Build Grammar

use sipha::grammar::{GrammarBuilder, Token, NonTerminal, Expr};

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum ArithNonTerminal {
    Expr,
    Term,
    Factor,
}

impl NonTerminal for ArithNonTerminal {
    fn name(&self) -> &str {
        match self {
            ArithNonTerminal::Expr => "Expr",
            ArithNonTerminal::Term => "Term",
            ArithNonTerminal::Factor => "Factor",
        }
    }
}

// Build your grammar rules
let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    // Add your grammar rules here
    .build()
    .expect("Failed to build grammar");

Step 5: Parse!

use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

let result = parser.parse(&tokens, ArithNonTerminal::Expr);

For a complete working example, see examples/basic_arithmetic.rs.

Incremental Parsing

Incremental parsing is Sipha's core strength. It allows you to efficiently update your parse tree when code changes, rather than re-parsing everything from scratch.

Why Incremental Parsing?

In interactive applications like IDEs, users edit code frequently. Traditional parsers re-parse the entire file on every change, which can be slow for large files. Incremental parsing:

  • Reuses unchanged nodes from previous parses
  • Only re-parses affected regions
  • Maintains parse caches for fast updates
  • Scales to large codebases efficiently

How It Works

use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::syntax::{TextRange, TextSize};

// Initial parse
let result1 = parser.parse_incremental(&tokens, None, &[], entry_point);

// After an edit (e.g., user changed "42" to "100")
let edits = vec![TextEdit {
    range: TextRange::new(
        TextSize::from(0),
        TextSize::from(2)
    ),
    new_text: "100".into(),
}];

// Incremental re-parse - only affected regions are re-parsed
let result2 = parser.parse_incremental(
    &new_tokens,
    Some(&result1.root),
    &edits,
    entry_point
);

The parser automatically:

  1. Identifies which parts of the tree are affected by the edit
  2. Reuses unchanged subtrees
  3. Only re-parses the minimal affected region

Current Status

Incremental parsing is fully implemented in version 0.5.0 with complete node reuse and cache management:

  • Node reuse: Unchanged subtrees are automatically identified and reused from previous parses
  • Cache population: Parse results are cached and can be reused for future parses
  • Affected range computation: Only affected regions are re-parsed
  • Smart cache invalidation: Cache entries are invalidated based on edit locations
  • Cross-session caching: Persistent parse cache enables node reuse across multiple parse sessions

The parser automatically integrates reusable nodes from previous parses and queries the persistent cache during parsing, providing significant performance improvements for interactive editing scenarios.

GLR Parsing

Sipha includes a GLR (Generalized LR) parser backend for handling ambiguous grammars. GLR parsing extends LR parsing to handle non-deterministic grammars by maintaining multiple parser stacks and forking on conflicts.

When to Use GLR

Use the GLR backend when:

  • Your grammar has inherent ambiguities (e.g., C++ template syntax)
  • You need to handle multiple valid parse trees
  • You want to disambiguate at parse time using precedence/associativity rules

Basic Usage

use sipha::backend::glr::{GlrParser, GlrConfig};
use sipha::backend::ParserBackend;

// Create GLR parser
let config = GlrConfig::default();
let parser = GlrParser::new(&grammar, config)
    .expect("Failed to create GLR parser");

// Parse with GLR - returns a parse forest for ambiguous results
let result = parser.parse(&tokens, entry_point);

// Handle ambiguity if present
if let Some(forest) = result.forest {
    // Multiple parse trees exist - disambiguate
    let disambiguated = forest.disambiguate(|alternatives| {
        // Custom disambiguation logic
        alternatives.first().cloned()
    });
}

Disambiguation

GLR parsers can produce parse forests when multiple valid parse trees exist. Sipha provides several disambiguation strategies:

  • Precedence-based: Resolve conflicts using operator precedence
  • Associativity-based: Resolve conflicts using operator associativity
  • Custom strategies: Implement your own disambiguation logic

For more details, see the backend::glr module documentation.

Architecture Overview

Parsing Backends

Sipha supports multiple parsing algorithms via feature flags:

  • LL(k) Parser (backend-ll): Top-down predictive parsing with configurable lookahead

    • Supports left-recursion elimination
    • Configurable error recovery
    • Incremental parsing support
  • LR Parser (backend-lr): Bottom-up shift-reduce parsing

    • Efficient for many grammar types
    • Good error recovery
  • GLR Parser (backend-glr): Generalized LR parsing for ambiguous grammars

    • Handles non-deterministic and ambiguous grammars
    • Parse forest representation for ambiguity tracking
    • Configurable disambiguation strategies (precedence, associativity)
    • Incremental parsing support
    • Ideal for complex languages like C++ with inherent ambiguities
  • Planned: PEG, Packrat, Earley parsers

Syntax Trees

Sipha uses an immutable green/red tree representation:

  • Green trees: Compact, shared representation stored in an arena
  • Red trees: Convenient API for traversing and querying the tree
  • Efficient memory usage: Shared subtrees reduce memory footprint
  • Fast traversal: Optimized for common tree operations

Error Recovery

Sipha provides configurable error recovery strategies:

  • Synchronization tokens: Skip to known recovery points
  • Delimited recovery: Skip to matching delimiters
  • Best-effort parsing: Continue parsing despite errors

Examples

The repository includes several examples to help you get started:

Run examples with:

cargo run --example basic_arithmetic
cargo run --example simple_calculator

API Overview

Core Modules

  • syntax: Syntax tree types (green/red trees, nodes, tokens)
  • grammar: Grammar definition and validation
  • lexer: Tokenization and lexing
  • parser: Parser traits and interfaces
  • backend: Parser backend implementations (LL, LR, etc.)
  • error: Error types and diagnostics
  • incremental: Incremental parsing support

Key Types

  • SyntaxKind: Trait for syntax node kinds
  • Token: Trait for tokens
  • NonTerminal: Trait for grammar non-terminals
  • GrammarBuilder: Build grammars declaratively
  • LexerBuilder: Build lexers with pattern matching
  • ParserBackend: Unified interface for all parser backends
  • SyntaxNode: Red tree node for traversal
  • GreenNode: Green tree node for storage

Performance

Incremental parsing provides significant performance benefits:

  • Fast updates: Only re-parse changed regions
  • Memory efficient: Shared tree representation
  • Scalable: Handles large files efficiently
  • IDE-ready: Designed for interactive editing scenarios

For batch parsing (non-interactive), Sipha is competitive with other Rust parsing libraries. The incremental parsing capabilities make it particularly well-suited for language servers and editors.

Comparison with Alternatives

Feature Sipha pest nom lalrpop
Incremental parsing βœ… ❌ ❌ ❌
Multiple backends βœ… ❌ ❌ ❌
Syntax trees βœ… βœ… ❌ βœ…
Error recovery βœ… βœ… βœ… βœ…
Grammar DSL ❌ βœ… ❌ βœ…
Zero-copy Partial βœ… βœ… ❌

When to use Sipha:

  • Building language servers or IDEs
  • Need incremental parsing for interactive editing
  • Want flexibility in choosing parsing algorithms
  • Need rich syntax tree manipulation
  • Parsing ambiguous grammars (use GLR backend)

When to consider alternatives:

  • Simple one-off parsers (pest or nom might be simpler)
  • Maximum performance for batch parsing (nom might be faster)
  • Prefer declarative grammar DSLs (pest or lalrpop)

Future Features

Sipha continues to evolve. Planned features for future releases include:

Short Term (Post-0.5.0)

  • Enhanced error recovery: More sophisticated recovery strategies
  • Better diagnostics: Improved error messages and suggestions
  • Performance optimizations: Further speed improvements for large files

Medium Term

  • Additional backends: PEG, Packrat, and Earley parsers
  • Grammar visualization: Interactive tools for visualizing and debugging grammars
  • Incremental lexing: Extend incremental capabilities to the lexer for even better performance

Long Term

  • Language server framework: Higher-level abstractions for building language servers
  • Parallel parsing: Parse multiple files in parallel
  • Advanced grammar analysis: Real-time grammar optimization suggestions

Contributing

Contributions are welcome! Sipha 0.5.0 provides a solid foundation, and there are many opportunities to help:

  • Bug reports: Found a bug? Please open an issue!
  • Feature requests: Have an idea? We'd love to hear it!
  • Code contributions: Pull requests are welcome
  • Documentation: Help improve our docs and examples
  • Testing: Help expand our test coverage

Development Setup

# Clone the repository
git clone https://github.com/yourusername/sipha.git
cd sipha

# Run tests
cargo test

# Run examples
cargo run --example basic_arithmetic

# Build documentation
cargo doc --open

License

This project is licensed under the MIT License - see the LICENSE file for details.

Documentation

Acknowledgments

Sipha is inspired by:

  • rust-analyzer for incremental parsing ideas
  • rowan for green/red tree design
  • Various parsing libraries in the Rust ecosystem

Note: Sipha 0.5.0 provides a complete incremental parsing solution. The core API is stable, and we continue to add features and improvements based on user feedback.

Commit count: 0

cargo fmt