xml_simple_parser

Crates.ioxml_simple_parser
lib.rsxml_simple_parser
version0.1.0
created_at2025-11-11 20:12:39.693252+00
updated_at2025-11-11 20:12:39.693252+00
descriptionA simple XML parser implemented in Rust using the Pest parser generator.
homepage
repositoryhttps://github.com/ZhekaOst/xml_simple_parser
max_upload_size
id1928130
size24,883
(ZhekaOst)

documentation

README

XML Simple Parser

Brief Description

This project implements a simplified XML parser in Rust using the pest library. It handles core XML structures: tags, attributes, and nested content.


🚀 Usage (Command-Line)

This project includes a Makefile to simplify common development commands.

Using Makefile (Recommended)

  • Check all (format, lint with clippy, and run tests):

    make check
    
  • Run parser (uses test.xml):

    make run
    
  • Run tests:

    make test
    
  • Format code:

    make fmt
    
  • Lint code:

    make clippy
    
  • Show all commands:

    make help
    

Using Cargo (Directly)

You can also run the program directly.

  • Parse a file:

    cargo run -- parse --file <path/to/your-file.xml>
    

    (Note: The parser will look for the file in the project root and in the src/ folder.)

  • Show author credits:

    cargo run -- credits
    
  • Show built-in CLI help:

    cargo run -- --help
    

Technical Description

What is Being Parsed?

The parser uses a Context-Free Grammar (CFG) to validate and structure the XML input. It recognizes the following elements via grammar rules:

  • name: Identifiers for tags and attributes.
  • attribute: Key-value pairs (key="value").
  • text_content: Literal data between tags.
  • element: Recursive structure for tags, children, and attributes.
  • xml: The complete document (ensuring a single root element).

How the Parsing Results Are Used?

The output is a tree of tokens (Pairs) from the pest library.

  1. Validation: The primary use is syntax validation. A successful parse confirms the input adheres to the XML grammar.
  2. AST Generation: The generated Pairs are used to build an Abstract Syntax Tree (AST) (native Rust structs) for subsequent data manipulation or interpretation.

Grammar Definition (grammar.pest)

This is the core logic of the parser. It defines the exact syntax that the parser will recognize.

WHITESPACE = _{ " " | "\t" | "\r" | "\n" }

name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

attribute_value = { "\"" ~ (!("\"") ~ ANY)* ~ "\"" } 
attribute = { name ~ "=" ~ attribute_value }

opening_tag = { "<" ~ name ~ attribute* ~ ">" }

closing_tag = { "</" ~ name ~ ">" }
text_content = @{ (!("<") ~ ANY)+ }

element = { opening_tag ~ (element | text_content)* ~ closing_tag }

xml = { SOI ~ element ~ EOI }
Commit count: 0

cargo fmt