tokit

Crates.iotokit
lib.rstokit
version0.0.0
created_at2025-12-13 10:32:27.870009+00
updated_at2025-12-13 10:32:27.870009+00
descriptionBlazing fast parser combinators: parse-while-lexing (zero-copy), deterministic LALR-style parsing, no backtracking. Flexible emitters for fail-fast runtime or greedy compiler diagnostics
homepagehttps://github.com/al8n/tokit
repositoryhttps://github.com/al8n/tokit
max_upload_size
id1982796
size1,485,183
Al Liu (al8n)

documentation

https://docs.rs/tokit

README

WIP: This project is still under active development and not ready for use.

Tokit

Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.

github LoC Build codecov

docs.rs crates.io crates.io license

English | 简体中文

Overview

Tokit is a blazing fast parser combinator library for Rust that uniquely combines:

  • Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
  • Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
  • Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the Emitter trait

Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.

Key Features

  • Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
  • No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
  • Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
  • Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the Emitter type - same parser, different behavior
  • Token-Based Parsing: Works directly on token streams from any lexer implementing the Lexer<'inp> trait
  • Composable Combinators: Build complex parsers from simple, reusable building blocks
  • Flexible Error Handling: Configurable error emission strategies (Fatal, Silent, Ignored)
  • Rich Error Recovery: Built-in support for error recovery and validation
  • Zero-Cost Abstractions: All configuration resolved at compile time
  • No-std Support: Core functionality works without allocator
  • Multiple Source Types: Support for str, [u8], Bytes, BStr, HipStr
  • Logos Integration: Optional LogosLexer adapter for seamless Logos integration
  • CST Support: Optional Concrete Syntax Tree support via rowan

Installation

Add this to your Cargo.toml:

[dependencies]
tokit = "0.0.0"

Feature Flags

  • std (default) - Enable standard library support
  • alloc - Enable allocator support for no-std environments
  • logos - Enable LogosLexer adapter for Logos integration
  • rowan - Enable CST (Concrete Syntax Tree) support with rowan integration
  • bytes - Support for bytes::Bytes as token source
  • bstr - Support for bstr::BStr as token source
  • hipstr - Support for hipstr::HipStr as token source
  • among - Enable Among<L, M, R> parseable support
  • smallvec - Enable small vector optimization utilities

Core Components

Lexer Layer

  • Lexer<'inp> Trait

    Core trait for lexers that produce token streams. Implement this to use any lexer with Tokit.

  • Token<'a> Trait

    Defines token types with:

    • Kind: Token kind discriminator
    • Error: Associated error type
  • LogosLexer<'inp, T, L> (feature: logos)

    Ready-to-use adapter for integrating Logos lexers.

Error Handling

Tokit's flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy:

  • Emitter Strategies
    • Fatal - Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLs
    • Greedy emitter (planned) - Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
    • Silent - Silently ignore errors
    • Ignored - Ignore errors completely

Key Design: Change the Emitter type to switch between fail-fast runtime parsing and greedy compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:

  • Runtime/REPL: Fast feedback with Fatal emitter

  • Compiler/IDE: Comprehensive diagnostics with greedy emitter (coming soon)

  • Rich Error Types (in error/ module)

    • Token-level: UnexpectedToken, MissingToken, UnexpectedEot
    • Syntax-level: Unclosed, Unterminated, Malformed, Invalid
    • Escape sequences: HexEscape, UnicodeEscape
    • All errors include span tracking

Utilities

  • Span Tracking

    • Span - Lightweight span representation
    • Spanned<T> - Wrap value with span
    • Located<T> - Wrap value with span and source slice
    • Sliced<T> - Wrap value with source slice
  • Parser Configuration

    • Parser<F, L, O, Error, Context> - Configurable parser
    • ParseContext - Context for emitter and cache
    • Window - Type-level peek buffer capacity for deterministic lookahead
    • Note: Lookahead windows support 1-32 token capacity via typenum::{U1..U32}

Quick Start

Here's a simple example parsing JSON tokens:

use logos::Logos;
use tokit::{Any, Parse, Token as TokenT};

#[derive(Debug, Logos, Clone)]
#[logos(skip r"[ \t\r\n\f]+")]
enum Token {
    #[token("true", |_| true)]
    #[token("false", |_| false)]
    Bool(bool),

    #[token("null")]
    Null,

    #[regex(r"-?(?:0|[1-9]\d*)(?:\.\d+)?", |lex| lex.slice().parse::<f64>().unwrap())]
    Number(f64),
}

#[derive(Debug, Display, Clone, Copy)]
enum TokenKind {
    Bool,
    Null,
    Number,
}

impl TokenT<'_> for Token {
    type Kind = TokenKind;
    type Error = ();

    fn kind(&self) -> Self::Kind {
        match self {
            Token::Bool(_) => TokenKind::Bool,
            Token::Null => TokenKind::Null,
            Token::Number(_) => TokenKind::Number,
        }
    }
}

type MyLexer<'a> = tokit::LogosLexer<'a, Token, Token>;

fn main() {
    // Parse any token and extract its value
    let parser = Any::parser::<'_, MyLexer<'_>, ()>()
      .map(|tok: Token| match tok {
        Token::Number(n) => Some(n),
        _ => None,
      });

    let result = parser.parse("42.5");
    println!("{:?}", result); // Ok(Some(42.5))
}

More Examples

Check out the examples directory:

# JSON token parsing with map combinators
cargo run --example json

# Note: The calculator examples are being updated for v0.3.0 API

Architecture

Tokit's architecture follows a layered design:

  1. Lexer Layer - Token production and source abstraction
  2. Parser Layer - Composable parser combinators
  3. Error Layer - Rich error types and emission strategies
  4. Utility Layer - Spans, containers, and helpers

This separation enables:

  • Use any lexer by implementing Lexer<'inp>
  • Mix and match parser combinators
  • Customize error handling per-parser or globally
  • Zero-cost abstractions through compile-time configuration

Design Philosophy

Parse-While-Lexing: Zero-Copy Streaming

Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:

Traditional Approach (Two-Phase):

Source → Lexer → [Token Buffer] → Parser
         ↓
    Allocate Vec<Token>  ← Extra allocation!

Tokit Approach (Streaming):

Source → Lexer ←→ Parser
         ↑________↓
    Zero-copy streaming, no buffer

Benefits:

  • Zero Extra Allocations: No token buffer, tokens consumed on-demand
  • Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
  • Better Cache Locality: Tokens processed immediately after lexing
  • Predictable Performance: No large allocations, deterministic memory usage

No Hidden Backtracking

Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:

  • Predictable Performance: No hidden exponential backtracking scenarios
  • Explicit Control: Developers decide when and where to peek ahead via peek_then() and peek_then_choice()
  • Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (Window trait)
  • Better Error Messages: Failed alternatives don't hide earlier, more relevant errors
// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3)  // May backtrack!

// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
    .peek_then::<_, typenum::U2>(|peeked, _| {
        match peeked.get(0) {
            Some(Token::If) => Ok(Action::Continue),  // Deterministic decision
            _ => Ok(Action::Stop),
        }
    });

Parser Combinators + Deterministic Table Parsing

Tokit uniquely combines:

  • Parser Combinator Flexibility: Compose small parsers into complex grammars
  • LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
  • Type-Level Capacity: Lookahead buffer size known at compile time (Window::CAPACITY)

This hybrid approach gives you composable abstractions without sacrificing performance or predictability.

Fail-Fast Runtime ↔ Greedy Compiler Diagnostics

Tokit's architecture decouples parsing logic from error handling strategy through the Emitter trait. This means:

Same Parser, Different Contexts:

  • Runtime/REPL Mode: Use Fatal emitter → stop on first error for immediate feedback
  • Compiler/IDE Mode: Use greedy emitter (planned) → collect all errors for comprehensive diagnostics
  • Testing/Fuzzing: Use Ignored emitter → parse through all errors for robustness testing

Benefits:

  • ✅ Write parsers once, deploy everywhere
  • ✅ No separate "error recovery mode" - it's just a different emitter
  • ✅ Custom emitters can implement domain-specific error handling
  • ✅ Zero-cost abstraction - emitter behavior resolved at compile time

Inspirations

Tokit takes inspiration from:

  • winnow - For ergonomic parser API design
  • chumsky - For composable parser combinator patterns
  • logos - For high-performance lexing
  • rowan - For lossless syntax tree representation

Core Priorities

  1. Performance - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
  2. Predictability - No hidden backtracking, explicit control flow, deterministic decisions
  3. Composability - Small parsers combine into complex ones
  4. Versatility - Same parser works for runtime (fail-fast) and compiler diagnostics (greedy) via Emitter
  5. Flexibility - Work with any lexer, customize error handling, support both AST and CST
  6. Correctness - Rich error types, span tracking, validation

Who Uses Tokit?

  • smear: Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs

License

tokit is dual-licensed under:

You may choose either license for your purposes.

Copyright (c) 2025 Al Liu.

Commit count: 0

cargo fmt