# Email Pest Parser *Link:* https://crates.io/crates/email_pest_parser *Docs:* https://docs.rs/email_pest_parser/latest/email_pest_parser/ A Rust-based email parser using the `pest` parsing library, designed to extract and validate various components such as headers, email addresses, and the body content of an email. ## Parsing Logic The grammar is defined using `pest` and covers the following components: - **Headers**: Key-value pairs that represent the metadata of an email, such as "From," "To," etc. - **Email Addresses**: Extracted from specific headers like "From" and "To." Supports standard formats for usernames and domains. - **Body**: The main content of the email, supporting multiple lines and any type of character. ### How It Works The parser processes emails by breaking them into components using the predefined grammar rules. Then, it encapsulates the results in a structured `ParsedEmail` object for easy access to headers, email addresses, and body content. ## Example ```text From: sender@example.com To: recipient@example.com Subject: Meeting Update Hello, This is a reminder for our meeting scheduled tomorrow at 10 AM. Please let us know if you have any questions. Best regards, Sender ``` ## Result ```text ParsedEmail { headers: [ ( "From", "sender@example.com", ), ( "To", "recipient@example.com", ), ( "Subject", "Meeting Update", ), ], body: "Hello,\r\n\r\nThis is a reminder for our meeting scheduled tomorrow at 10 AM.\r\nPlease let us know if you have any questions.\r\n\r\nBest regards,\r\nSender", email_addresses: [ "sender@example.com", "recipient@example.com", ], } ``` ### Grammar ### email ``` email = { headers ~ NEWLINE ~ body } ``` The email rule consists of two main parts: the `headers` and the `body`. These are separated by a `NEWLINE`. The `headers` are a series of header lines, and the `body` contains the actual message content. ### headers ``` headers = { (header_line ~ NEWLINE)* } ``` The `headers` rule consists of one or more `header_line` rules, each followed by a `NEWLINE`. ### header_line ``` header_line = { field_name ~ ": " ~ field_value } ``` A `header_line` consists of a `field_name`, followed by a colon and a space (`": "`), and then a `field_value`. The `field_name` represents the name of the header (e.g., `Subject`, `From`), and the `field_value` represents the value of the header. ### field_name ``` field_name = { (ASCII_ALPHANUMERIC | "-" | "_")+ } ``` A `field_name` can consist of one or more characters, which can be ASCII alphanumeric characters (letters and numbers), or the special characters `"-"` (hyphen) or `"_"` (underscore). ### field_value ``` field_value = { (!NEWLINE ~ ANY)+ } ``` A `field_value` consists of any characters except a `NEWLINE`. The `ANY` rule matches any character, and the value can have one or more characters. ### body ``` body = { (!EOI ~ ANY)* } ``` The `body` rule matches any character (`ANY`) except the end of input (`EOI`), repeated zero or more times. This rule defines the actual content of the email after the headers section. ### email_address ``` email_address = { username ~ "@" ~ domain } ``` An `email_address` consists of a `username`, followed by the "@" symbol, and then a `domain`. The domain is further broken down into subdomains. ### username ``` username = { (ASCII_ALPHANUMERIC | "_" | "." | "-")+ } ``` A `username` can consist of one or more characters, which can be ASCII alphanumeric characters, or the special characters `"_", ".", and "-"`. ### domain ``` domain = { subdomain ~ ("." ~ subdomain)+ } ``` A `domain` consists of one or more `subdomain` rules, separated by periods (`"."`). Each subdomain is defined by the `subdomain` rule. ### subdomain ``` subdomain = { ASCII_ALPHANUMERIC+ } ``` A `subdomain` consists of one or more ASCII alphanumeric characters. Subdomains are typically used in the domain name (e.g., `gmail` in `gmail.com`).