tokit

Crates.io	tokit
lib.rs	tokit
version	0.0.0
created_at	2025-12-13 10:32:27.870009+00
updated_at	2025-12-13 10:32:27.870009+00
description	Blazing fast parser combinators: parse-while-lexing (zero-copy), deterministic LALR-style parsing, no backtracking. Flexible emitters for fail-fast runtime or greedy compiler diagnostics
homepage	https://github.com/al8n/tokit
repository	https://github.com/al8n/tokit
max_upload_size
id	1982796
size	1,485,183

Al Liu (al8n)

documentation

https://docs.rs/tokit

README

WIP: This project is still under active development and not ready for use.

Tokit

Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.

LoC

English | 简体中文

Overview

Tokit is a blazing fast parser combinator library for Rust that uniquely combines:

Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the Emitter trait

Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.

Key Features

Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the Emitter type - same parser, different behavior
Token-Based Parsing: Works directly on token streams from any lexer implementing the Lexer<'inp> trait
Composable Combinators: Build complex parsers from simple, reusable building blocks
Flexible Error Handling: Configurable error emission strategies (Fatal, Silent, Ignored)
Rich Error Recovery: Built-in support for error recovery and validation
Zero-Cost Abstractions: All configuration resolved at compile time
No-std Support: Core functionality works without allocator
Multiple Source Types: Support for str, [u8], Bytes, BStr, HipStr
Logos Integration: Optional LogosLexer adapter for seamless Logos integration
CST Support: Optional Concrete Syntax Tree support via rowan

Installation

Add this to your Cargo.toml:

[dependencies]
tokit = "0.0.0"

Feature Flags

std (default) - Enable standard library support
alloc - Enable allocator support for no-std environments
logos - Enable LogosLexer adapter for Logos integration
rowan - Enable CST (Concrete Syntax Tree) support with rowan integration
bytes - Support for bytes::Bytes as token source
bstr - Support for bstr::BStr as token source
hipstr - Support for hipstr::HipStr as token source
among - Enable Among<L, M, R> parseable support
smallvec - Enable small vector optimization utilities

Core Components

Lexer Layer

Lexer<'inp> Trait

Core trait for lexers that produce token streams. Implement this to use any lexer with Tokit.
Token<'a> Trait

Defines token types with:
- Kind: Token kind discriminator
- Error: Associated error type
LogosLexer<'inp, T, L> (feature: logos)

Ready-to-use adapter for integrating Logos lexers.

Error Handling

Tokit's flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy:

Emitter Strategies
- Fatal - Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLs
- Greedy emitter (planned) - Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
- Silent - Silently ignore errors
- Ignored - Ignore errors completely

Key Design: Change the Emitter type to switch between fail-fast runtime parsing and greedy compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:

Runtime/REPL: Fast feedback with Fatal emitter
Compiler/IDE: Comprehensive diagnostics with greedy emitter (coming soon)
Rich Error Types (in error/ module)
- Token-level: UnexpectedToken, MissingToken, UnexpectedEot
- Syntax-level: Unclosed, Unterminated, Malformed, Invalid
- Escape sequences: HexEscape, UnicodeEscape
- All errors include span tracking

Utilities

Span Tracking
- Span - Lightweight span representation
- Spanned<T> - Wrap value with span
- Located<T> - Wrap value with span and source slice
- Sliced<T> - Wrap value with source slice
Parser Configuration
- Parser<F, L, O, Error, Context> - Configurable parser
- ParseContext - Context for emitter and cache
- Window - Type-level peek buffer capacity for deterministic lookahead
- Note: Lookahead windows support 1-32 token capacity via typenum::{U1..U32}

Quick Start

Here's a simple example parsing JSON tokens:

use logos::Logos;
use tokit::{Any, Parse, Token as TokenT};

#[derive(Debug, Logos, Clone)]
#[logos(skip r"[ \t\r\n\f]+")]
enum Token {
    #[token("true", |_| true)]
    #[token("false", |_| false)]
    Bool(bool),

    #[token("null")]
    Null,

    #[regex(r"-?(?:0|[1-9]\d*)(?:\.\d+)?", |lex| lex.slice().parse::<f64>().unwrap())]
    Number(f64),
}

#[derive(Debug, Display, Clone, Copy)]
enum TokenKind {
    Bool,
    Null,
    Number,
}

impl TokenT<'_> for Token {
    type Kind = TokenKind;
    type Error = ();

    fn kind(&self) -> Self::Kind {
        match self {
            Token::Bool(_) => TokenKind::Bool,
            Token::Null => TokenKind::Null,
            Token::Number(_) => TokenKind::Number,
        }
    }
}

type MyLexer<'a> = tokit::LogosLexer<'a, Token, Token>;

fn main() {
    // Parse any token and extract its value
    let parser = Any::parser::<'_, MyLexer<'_>, ()>()
      .map(|tok: Token| match tok {
        Token::Number(n) => Some(n),
        _ => None,
      });

    let result = parser.parse("42.5");
    println!("{:?}", result); // Ok(Some(42.5))
}

More Examples

Check out the examples directory:

# JSON token parsing with map combinators
cargo run --example json

# Note: The calculator examples are being updated for v0.3.0 API

Architecture

Tokit's architecture follows a layered design:

Lexer Layer - Token production and source abstraction
Parser Layer - Composable parser combinators
Error Layer - Rich error types and emission strategies
Utility Layer - Spans, containers, and helpers

This separation enables:

Use any lexer by implementing Lexer<'inp>
Mix and match parser combinators
Customize error handling per-parser or globally
Zero-cost abstractions through compile-time configuration

Design Philosophy

Parse-While-Lexing: Zero-Copy Streaming

Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:

Traditional Approach (Two-Phase):

Source → Lexer → [Token Buffer] → Parser
         ↓
    Allocate Vec<Token>  ← Extra allocation!

Tokit Approach (Streaming):

Source → Lexer ←→ Parser
         ↑________↓
    Zero-copy streaming, no buffer

Benefits:

✅ Zero Extra Allocations: No token buffer, tokens consumed on-demand
✅ Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
✅ Better Cache Locality: Tokens processed immediately after lexing
✅ Predictable Performance: No large allocations, deterministic memory usage

No Hidden Backtracking

Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:

Predictable Performance: No hidden exponential backtracking scenarios
Explicit Control: Developers decide when and where to peek ahead via peek_then() and peek_then_choice()
Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (Window trait)
Better Error Messages: Failed alternatives don't hide earlier, more relevant errors

// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3)  // May backtrack!

// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
    .peek_then::<_, typenum::U2>(|peeked, _| {
        match peeked.get(0) {
            Some(Token::If) => Ok(Action::Continue),  // Deterministic decision
            _ => Ok(Action::Stop),
        }
    });

Parser Combinators + Deterministic Table Parsing

Tokit uniquely combines:

Parser Combinator Flexibility: Compose small parsers into complex grammars
LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
Type-Level Capacity: Lookahead buffer size known at compile time (Window::CAPACITY)

This hybrid approach gives you composable abstractions without sacrificing performance or predictability.

Fail-Fast Runtime ↔ Greedy Compiler Diagnostics

Tokit's architecture decouples parsing logic from error handling strategy through the Emitter trait. This means:

Same Parser, Different Contexts:

Runtime/REPL Mode: Use Fatal emitter → stop on first error for immediate feedback
Compiler/IDE Mode: Use greedy emitter (planned) → collect all errors for comprehensive diagnostics
Testing/Fuzzing: Use Ignored emitter → parse through all errors for robustness testing