Crates.io | email_pest_parser |
lib.rs | email_pest_parser |
version | 0.1.2 |
source | src |
created_at | 2024-11-12 20:57:41.935689 |
updated_at | 2024-11-16 11:47:22.200709 |
description | An email parser that parse entire email and validates email addresses. |
homepage | |
repository | |
max_upload_size | |
id | 1445538 |
size | 24,053 |
Link: https://crates.io/crates/email_pest_parser
Docs: https://docs.rs/email_pest_parser/latest/email_pest_parser/
A Rust-based email parser using the pest
parsing library, designed to extract and validate various components such as headers, email addresses, and the body content of an email.
The grammar is defined using pest
and covers the following components:
The parser processes emails by breaking them into components using the predefined grammar rules. Then, it encapsulates the results in a structured ParsedEmail
object for easy access to headers, email addresses, and body content.
From: sender@example.com
To: recipient@example.com
Subject: Meeting Update
Hello,
This is a reminder for our meeting scheduled tomorrow at 10 AM.
Please let us know if you have any questions.
Best regards,
Sender
ParsedEmail {
headers: [
(
"From",
"sender@example.com",
),
(
"To",
"recipient@example.com",
),
(
"Subject",
"Meeting Update",
),
],
body: "Hello,\r\n\r\nThis is a reminder for our meeting scheduled tomorrow at 10 AM.\r\nPlease let us know if you have any questions.\r\n\r\nBest regards,\r\nSender",
email_addresses: [
"sender@example.com",
"recipient@example.com",
],
}
email = { headers ~ NEWLINE ~ body }
The email rule consists of two main parts: the headers
and the body
. These are separated by a NEWLINE
. The headers
are a series of header lines, and the body
contains the actual message content.
headers = { (header_line ~ NEWLINE)* }
The headers
rule consists of one or more header_line
rules, each followed by a NEWLINE
.
header_line = { field_name ~ ": " ~ field_value }
A header_line
consists of a field_name
, followed by a colon and a space (": "
), and then a field_value
. The field_name
represents the name of the header (e.g., Subject
, From
), and the field_value
represents the value of the header.
field_name = { (ASCII_ALPHANUMERIC | "-" | "_")+ }
A field_name
can consist of one or more characters, which can be ASCII alphanumeric characters (letters and numbers), or the special characters "-"
(hyphen) or "_"
(underscore).
field_value = { (!NEWLINE ~ ANY)+ }
A field_value
consists of any characters except a NEWLINE
. The ANY
rule matches any character, and the value can have one or more characters.
body = { (!EOI ~ ANY)* }
The body
rule matches any character (ANY
) except the end of input (EOI
), repeated zero or more times. This rule defines the actual content of the email after the headers section.
email_address = { username ~ "@" ~ domain }
An email_address
consists of a username
, followed by the "@" symbol, and then a domain
. The domain is further broken down into subdomains.
username = { (ASCII_ALPHANUMERIC | "_" | "." | "-")+ }
A username
can consist of one or more characters, which can be ASCII alphanumeric characters, or the special characters "_", ".", and "-"
.
domain = { subdomain ~ ("." ~ subdomain)+ }
A domain
consists of one or more subdomain
rules, separated by periods ("."
). Each subdomain is defined by the subdomain
rule.
subdomain = { ASCII_ALPHANUMERIC+ }
A subdomain
consists of one or more ASCII alphanumeric characters. Subdomains are typically used in the domain name (e.g., gmail
in gmail.com
).