minipg - Mini Parser Generator
A blazingly fast, modern parser generator written in Rust with incremental parsing and editor integration. Generate parsers for 9 languages from ANTLR4 grammars, with complete infrastructure to replace Tree-sitter for editor tooling.
✨ Features
⚡ Incremental Parsing (NEW in v0.1.5)
- Position Tracking - Byte offsets and line/column for every AST node
- Edit Tracking - Insert, delete, replace operations with automatic point calculation
- Fast Re-parsing - Foundation for <10ms incremental edits
- Editor Integration - Complete infrastructure for replacing Tree-sitter
- Query Language - Tree-sitter-compatible S-expression queries for pattern matching
- Syntax Highlighting - Pattern-based highlighting with capture groups
🚀 Performance
- faster than ANTLR4 for code generation
- Linear O(n) scaling with grammar complexity
- Sub-millisecond generation for typical grammars
- <100 KB memory usage
- Incremental updates - Target <10ms for typical edits
🌍 Multi-Language Support (9 Languages)
- Rust - Optimized with inline attributes and DFA generation ✅
- Python - Type hints and dataclasses (Python 3.10+) ✅
- JavaScript - Modern ES6+ with error recovery ✅
- TypeScript - Full type safety with interfaces and enums ✅
- Go - Idiomatic Go with interfaces and error handling ✅
- Java - Standalone .java files with proper package structure ✅
- C - Standalone .c/.h files with manual memory management ✅
- C++ - Modern C++17+ with RAII and smart pointers ✅
- Tree-sitter - Grammar.js for editor syntax highlighting (VS Code, Neovim, Atom) ✅
🎯 ANTLR4 Compatible
- Advanced Character Classes - Full support with Unicode escapes (
\u0000-\uFFFF) ✅
- Non-Greedy Quantifiers -
.*?, .+?, .?? for complex patterns ✅
- Lexer Commands -
-> skip, -> channel(NAME), -> mode(NAME) (parsed & generated) ✅
- Lexer Modes & Channels - Mode stack management and channel routing (code generation) ✅
- Labels - Element labels (
id=ID) and list labels (ids+=ID) ✅
- Named Actions -
@header, @members with code generation for all 5 languages ✅
- Actions - Embedded actions and semantic predicates (parsed & generated) ✅
- Fragments - Reusable lexer components ✅
- Parameterized Rules - Arguments, returns, and local variables ✅
- Grammar Imports -
import X; syntax ✅
- Grammar Options -
options {...} blocks ✅
- Real-World Grammars - CompleteJSON.g4 ✅, SQL.g4 ✅, 16 example grammars ✅
- Modular Architecture: Organized into focused crates
- Trait-Based Design: Extensible and testable
- Rich Diagnostics: Detailed error messages with location information
- AST with Visitor Pattern: Flexible tree traversal
- Semantic Analysis:
- Undefined rule detection
- Duplicate rule detection
- Left recursion detection
- Reachability analysis
- Empty alternative warnings
- Code Generation:
- Generates optimized standalone parsers
- Visitor pattern generation
- Listener pattern generation
- Configurable output
- CLI Tool: Easy-to-use command-line interface
- Error Recovery: Robust error handling and recovery strategies
- Comprehensive Documentation: User guide, API docs, and syntax reference
- Snapshot Testing: Comprehensive tests using insta for regression prevention
- Complex Grammar Examples: JSON, SQL, Java, Python, and more
Architecture
minipg is organized as a single crate with modular structure:
- core: Core types, traits, and error handling
- ast: Abstract Syntax Tree definitions and visitor patterns
- parser: Grammar file parser (lexer + parser)
- analysis: Semantic analysis and validation
- codegen: Code generation for target languages (Rust, Python, JS, TS)
- CLI: Command-line interface with binary
See ARCHITECTURE.md for detailed design documentation.
Installation
From crates.io
cargo install minipg
From Source
git clone https://github.com/yingkitw/minipg
cd minipg
cargo install --path .
Usage
Generate a Parser
# Generate Rust parser
minipg generate grammar.g4 -o output/ -l rust
# Generate Python parser
minipg generate grammar.g4 -o output/ -l python
# Generate JavaScript parser
minipg generate grammar.g4 -o output/ -l javascript
# Generate TypeScript parser
minipg generate grammar.g4 -o output/ -l typescript
# Generate Go parser
minipg generate grammar.g4 -o output/ -l go
# Generate Tree-sitter grammar
minipg generate grammar.g4 -o output/ -l treesitter
Validate a Grammar
minipg validate grammar.g4
Show Grammar Information
minipg info grammar.g4
Grammar Syntax
minipg supports ANTLR4-compatible syntax with advanced features:
grammar Calculator;
// Parser rules
expr: term (('+' | '-') term)*;
term: factor (('*' | '/') factor)*;
factor: NUMBER | '(' expr ')';
// Lexer rules with character classes
NUMBER: [0-9]+;
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]*;
// Non-greedy quantifiers for comments
BLOCK_COMMENT: '/*' .*? '*/' -> skip;
LINE_COMMENT: '//' .*? '\n' -> skip;
// Unicode escapes in character classes
STRING: '"' (ESC | ~["\\\u0000-\u001F])* '"';
fragment ESC: '\\' ["\\/bfnrt];
// Lexer commands
WS: [ \t\r\n]+ -> skip;
Comparisons
minipg vs ANTLR4 vs Pest
A comprehensive comparison of three parser generator tools:
| Feature |
minipg |
ANTLR4 |
Pest |
| Language |
Rust |
Java |
Rust |
| Runtime Dependency |
None (standalone) |
Requires runtime library |
Requires runtime library |
| Grammar Syntax |
ANTLR4 (industry standard) |
ANTLR4 (native) |
PEG (Parsing Expression Grammar) |
| Grammar Compatibility |
100% ANTLR4 compatible |
Native |
Pest-specific |
| Grammar Ecosystem |
Compatible with 1000+ ANTLR4 grammars |
Native ecosystem |
Pest-specific grammars |
| Target Languages |
Rust, Python, JS, TS, Go, Java, C, C++, Tree-sitter |
Java, Python, JS, C#, C++, Go, Swift |
Rust only |
| Code Generation |
Standalone parsers (no runtime) |
Runtime-based parsers |
Macro-based (requires runtime) |
| Generation Speed |
Sub-millisecond |
Seconds |
Compile-time |
| Memory Usage |
<100 KB |
Higher (JVM overhead) |
Low (Rust native) |
| AST Patterns |
Auto-generated visitor/listener |
Auto-generated visitor/listener |
Manual tree walking |
| Error Recovery |
Built-in, continues after errors |
Built-in, continues after errors |
Stops at first error |
| Test Coverage |
186+ tests, 100% pass rate |
Comprehensive |
Good |
| Grammar Test Suite |
✅ All tests pass |
✅ Comprehensive |
✅ Good |
| Real-World Grammars |
✅ grammars-v4 compatible |
✅ Native support |
Limited ecosystem |
| Standalone Output |
✅ Yes (no dependencies) |
❌ Requires runtime |
❌ Requires runtime |
| Multi-Language |
✅ 8 languages |
✅ 7+ languages |
❌ Rust only |
| Modern Implementation |
✅ Rust 2024 |
Java-based |
✅ Rust macros |
Key Advantages of minipg:
- ⚡ Fast code generation - sub-millisecond for typical grammars
- 🚀 No runtime dependencies - generates standalone parsers
- 🦀 Modern Rust implementation with safety guarantees
- 📦 Smaller footprint - <100 KB memory usage
- 🔧 Easy integration - no Java runtime required
- ✅ Comprehensive testing - 147 tests with 100% pass rate
- ✅ Grammar compatibility - works with existing ANTLR4 grammars
- ✅ Multi-language - generate parsers for 9 different languages
- ✅ Editor integration - Tree-sitter support for VS Code, Neovim, Atom
Choose minipg if you need:
- Multi-language parser generation
- ANTLR4 grammar compatibility
- Standalone, portable parsers with no runtime dependencies
- Automatic visitor/listener patterns
- Fast code generation
- Comprehensive test coverage
Choose ANTLR4 if you need:
- Mature, battle-tested tooling
- Extensive documentation and community
- Java ecosystem integration
- Runtime-based parsing with advanced features
Choose Pest if you need:
- Rust-only parsing
- PEG parsing semantics
- Compile-time grammar validation
- Tight Rust macro integration
- Zero-cost abstractions at compile time
See docs/archive/COMPARISON_WITH_ANTLR4RUST.md and docs/archive/COMPARISON_WITH_PEST.md for detailed comparisons.
Documentation
- User Guide - Complete guide to using minipg
- Grammar Syntax Reference - Detailed syntax specification
- API Documentation - API reference for library usage
- Architecture - Design and architecture overview
- ANTLR4 Compatibility - Full ANTLR4 grammar support
- Multi-Language Plan - Target language support roadmap
- Runtime Decision - Why standalone generation
- Comparison with ANTLR4 - Performance and feature comparison
- Comparison with Pest - Rust parser generator comparison
- Examples - 16 example grammars (beginner to advanced)
- Simple: calculator, JSON
- Intermediate: Expression, Config, YAML
- Advanced: GraphQL, Query, CSS, Markdown, Protocol, SQL, Java, Python
- Examples Guide - Comprehensive examples documentation
- Archive - Historical session reports and release notes
Development
Building
cargo build
Running Tests
cargo test --all
✅ All Tests Passing!
minipg has comprehensive test coverage with 186+ tests passing at 100% success rate:
- 106 unit tests - Core functionality and parsing
- 19 integration tests - Full pipeline (parse → analyze → generate)
- 21 analysis tests - Semantic analysis, ambiguity detection, reachability
- 21 codegen tests - Multi-language code generation
- 19 compatibility tests - ANTLR4 feature compatibility
- 13 feature tests - Advanced grammar features
- 9 example tests - Real-world grammar examples
Grammar Test Suite: minipg can successfully parse and generate code from a wide variety of ANTLR4 grammars, including:
- ✅ All example grammars in the repository
- ✅ Real-world grammars from the grammars-v4 repository
- ✅ Complex grammars with advanced features (modes, channels, actions)
- ✅ Multi-language code generation validation
All tests pass successfully, demonstrating robust grammar parsing and code generation capabilities.
Running with Logging
RUST_LOG=info cargo run -- generate grammar.g4
Project Status
- Current Version: 0.1.5 (Production Ready)
- Status: Editor Integration Foundation Complete ✅
- Test Suite: 147 tests with 100% pass rate
- ✅ All grammar parsing tests pass
- ✅ All code generation tests pass
- ✅ All integration tests pass
- ✅ All compatibility tests pass
- ✅ Incremental parsing tests pass (18 tests)
- ✅ Query language tests pass (16 tests)
- ✅ Comprehensive coverage of ANTLR4 features
- Target Languages: 9 languages (Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Tree-sitter)
- Package: Single consolidated crate for easy installation
- Grammar Support:
- ✅ CompleteJSON.g4 - Full JSON grammar
- ✅ SQL.g4 - SQL grammar subset
- ✅ 19+ example grammars
- ✅ Real-world grammars from grammars-v4 repository
- E2E Coverage: Full pipeline testing from grammar to working parser
- ANTLR4 Compatibility: High - supports most common features with comprehensive test coverage
- Latest Features (v0.1.5):
- ✅ Incremental Parsing - Position tracking, edit tracking, incremental re-parsing (NEW)
- ✅ Query Language - Tree-sitter-compatible S-expression queries for pattern matching (NEW)
- ✅ Tree-sitter Generator - Generate grammar.js for editor integration
- ✅ Editor Foundation - Complete infrastructure for replacing Tree-sitter (NEW)
- ✅ Go code generator (idiomatic, production-ready)
- ✅ Rule arguments:
rule[Type name]
- ✅ Return values:
returns [Type name]
- ✅ Local variables:
locals [Type name]
- ✅ List labels (
ids+=ID)
- ✅ Named actions with code generation
See TODO.md for current tasks and docs/archive/ROADMAP.md for the complete roadmap.
License
Apache-2.0