Crates.io | pagegraph |
lib.rs | pagegraph |
version | 0.1.3 |
source | src |
created_at | 2023-07-31 23:36:41.680112 |
updated_at | 2023-08-07 23:42:06.732548 |
description | Rust library for analyzing PageGraph files |
homepage | |
repository | |
max_upload_size | |
id | 931283 |
size | 164,968 |
This crate provides utilities for analyzing PageGraph outputs.
pagegraph
provides a core library for interacting directly with pagegraph files and building custom extraction tools.
pagegraph-cli
provides a more convenient, no-code wrapper around common operations, supplying outputs in easily-parseable formats.
The following example reads from a PageGraph file and produces all deleted
div
elements from the corresponding webpage.
use pagegraph::from_xml::read_from_file;
use pagegraph::types::{ NodeType, EdgeType };
fn main() {
let graph = read_from_file("/path/to/any/pagegraph.graphml");
let deleted_divs = graph.filter_nodes(|node| {
match node {
NodeType::HtmlElement { is_deleted: true, tag_name, .. } if tag_name == "div" => true,
_ => false,
}
});
}