| Crates.io | rust-pickaxe |
| lib.rs | rust-pickaxe |
| version | 0.5.5 |
| created_at | 2025-05-20 04:41:07.785417+00 |
| updated_at | 2025-06-25 00:39:41.887388+00 |
| description | HTML data extraction library |
| homepage | |
| repository | https://gitlab.com/emergentmethods/pickaxe |
| max_upload_size | |
| id | 1680746 |
| size | 17,886,375 |
Pickaxe is a Python package for structured data extraction from HTML documents. It provides a simple and intuitive API for parsing HTML documents, and automatically extracting structured data from them.
html5ever and selectors crate for browser-grade HTML parsing and CSS selector matching.pip install python-pickaxe
from pickaxe import HtmlDocument
# Parse an HTML document
document = HtmlDocument.from_str("<html><body><h1>Hello, World!</h1></body></html>")
# Access elements using CSS selectors or XPath expressions
heading = document.find("h1")
print(heading.inner_text) # Output: Hello, World!
heading = document.find_xpath("//h1")
print(heading.inner_text) # Output: Hello, World!
cargo add rust-pickaxe
use pickaxe::HtmlDocument;
fn main() {
// Parse an HTML document
let document = HtmlDocument::from_str("<html><body><h1>Hello, World!</h1></body></html>").unwrap();
// Access elements using CSS selectors or XPath expressions
let heading = document.find("h1").unwrap();
println!("{}", heading.inner_text()); // Output: Hello, World!
let heading = document.find_xpath("//h1").unwrap();
println!("{}", heading.inner_text()); // Output: Hello, World!
}
This project is licensed under MIT License.
If you encounter any issues or have feedback, please open an issue. We'd love to hear from you!
Made with ❤️ by Emergent Methods