Crates.io | url_parser |
lib.rs | url_parser |
version | 0.1.1 |
source | src |
created_at | 2024-11-11 20:33:26.815202 |
updated_at | 2024-11-13 22:10:20.293781 |
description | URL Parser is a Rust parser developed to parse URLs into structured components such as scheme, domain, path, query and fragment. |
homepage | |
repository | |
max_upload_size | |
id | 1444187 |
size | 30,248 |
Link: https://crates.io/crates/url_parser Docs: https://docs.rs/url_parser/0.1.0/url_parser/
URL Parser is a Rust parser developed to parse URLs into structured components such as scheme, domain, path, query and fragment.
URL Parser processes a URL string and extracts the following components:
The parsed components are essential for applications needing to analyze, validate or manipulate URLs.
url = { scheme ~ "://" ~ domain ~ path? ~ query? ~ fragment? }
Purpose: Defines the overall structure of the URL.
Explanation: The url rule is the basic rule for parsing a full URL. It expects:
'?' after some components (path?, query?, fragment?) means that these components are optional.
scheme = @{ ASCII_ALPHANUMERIC+ }
Purpose: Identifies the URL scheme or protocol (for example, "http", "https", "ftp").
Explanation: A scheme rule matches one or more alphanumeric characters (letters and numbers). The '@' symbol marks this rule as an atomic rule, which means that it is processed as a single unit to improve performance and ensure correct parsing.
domain = @{ (!("/" | "?") ~ ANY)+ }
Purpose: Defines the domain part of the URL (for example, "example.com", "localhost", "192.168.0.1").
Explanation: The domain rule finds one or more characters that are not the '/' or '?' character. The ANY keyword matches any character other than a newline. This allows the domain to include subdomains, IP addresses or hostnames, as long as they do not contain '/' or '?'.
path = { "/" ~ (!("?" | "#") ~ ANY)* }
Purpose: Defines a component of a URL path that indicates the location of a resource on the server (for example, "/products/electronics").
Explanation: The path rule begins with '/' and then allows zero or more characters that are not the '?' or '#' character. This excludes query parameters and fragment identifiers from the path part.
query = { "?" ~ (!"#" ~ ANY)* }
Purpose: Defines a query string in the URL that is typically used to pass parameters (for example, "?query=1").
Explanation: The query rule starts with '?' and allows zero or more characters that are not '#'. This ensures that the query string does not include a fragment character (#) that belongs to a URL fragment component.
fragment = { "#" ~ ANY* }
Purpose: Defines a section identifier that points to a specific section within the resource (for example, "#reviews").
Explanation: The fragment rule starts with '#' and allows any number of characters after it. This rule allows the fragment to contain any characters after the '#'.