pagegraph

Crates.iopagegraph
lib.rspagegraph
version0.1.3
sourcesrc
created_at2023-07-31 23:36:41.680112
updated_at2023-08-07 23:42:06.732548
descriptionRust library for analyzing PageGraph files
homepage
repository
max_upload_size
id931283
size164,968
Anton Lazarev (antonok-edm)

documentation

README

pagegraph

This crate provides utilities for analyzing PageGraph outputs.

Workspace organization

pagegraph provides a core library for interacting directly with pagegraph files and building custom extraction tools.

pagegraph-cli provides a more convenient, no-code wrapper around common operations, supplying outputs in easily-parseable formats.

Example

The following example reads from a PageGraph file and produces all deleted div elements from the corresponding webpage.

use pagegraph::from_xml::read_from_file;
use pagegraph::types::{ NodeType, EdgeType };

fn main() {
    let graph = read_from_file("/path/to/any/pagegraph.graphml");

    let deleted_divs = graph.filter_nodes(|node| {
        match node {
            NodeType::HtmlElement { is_deleted: true, tag_name, .. } if tag_name == "div" => true,
            _ => false,
        }
    });
}
Commit count: 0

cargo fmt