# html5ever-stream [![Travis CI Status](https://travis-ci.org/rossdylan/html5ever-stream.svg?branch=master)](https://travis-ci.org/rossdylan/html5ever-stream) [![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE) [![crates.io](https://img.shields.io/crates/v/html5ever-stream.svg)](https://crates.io/crates/html5ever-stream) [![Released API docs](https://docs.rs/html5ever-stream/badge.svg)](https://docs.rs/html5ever-stream) Adapters to easily stream data into an [html5ever](https://crates.io/crates/html5ever) parser. ## Overview This crate aims to provide shims to make it relatively painless to parse html from some stream of data. This stream could be consumed by the standard IO Reader/Writer traits, or via a [Stream](https://docs.rs/futures/0.1.2/futures/stream/trait.Stream.html) from the [futures](https://docs.rs/futures/0.1.21/futures/) crate * Support for any Stream that emits an item implementing AsRef<[u8]> * Supports hyper and unstable reqwest types automatically * Support for [reqwest's copy_to](https://docs.rs/reqwest/0.8.6/reqwest/struct.Response.html#method.copy_to) method * Helper wrappers for RcDom to make it easier to work with. ## Examples ### Using Hyper 0.11 ```rust extern crate futures; extern crate html5ever; extern crate html5ever_stream; extern crate hyper; extern crate hyper_tls; extern crate tokio_core; extern crate num_cpus; use html5ever::rcdom; use futures::{Future, Stream}; use hyper::Client; use hyper_tls::HttpsConnector; use tokio_core::reactor::Core; use html5ever_stream::{ParserFuture, NodeStream}; fn main() { let mut core = Core::new().unwrap(); let handle = core.handle(); let client = Client::configure() .connector(HttpsConnector::new(num_cpus::get(), &handle).unwrap()) .build(&handle); // NOTE: We throw away errors here in two places, you are better off casting them into your // own custom error type in order to propagate them. let req_fut = client.get("https://github.com".parse().unwrap()).map_err(|_| ()); let parser_fut = req_fut.and_then(|res| { ParserFuture::new(res.body().map_err(|_| ()), rcdom::RcDom::default()) }); let nodes = parser_fut.and_then(|dom| { NodeStream::new(&dom).collect() }); let print_fut = nodes.and_then(|vn| { println!("found {} elements", vn.len()); Ok(()) }); core.run(print_fut).unwrap(); } ``` ### Using Unstable Async Reqwest 0.8.6 ```rust extern crate futures; extern crate html5ever; extern crate html5ever_stream; extern crate reqwest; extern crate tokio_core; use html5ever::rcdom; use futures::{Future, Stream}; use reqwest::unstable::async as async_reqwest; use tokio_core::reactor::Core; use html5ever_stream::{ParserFuture, NodeStream}; fn main() { let mut core = Core::new().unwrap(); let client = async_reqwest::Client::new(&core.handle()); // NOTE: We throw away errors here in two places, you are better off casting them into your // own custom error type in order to propagate them. let req_fut = client.get("https://github.com").send().map_err(|_| ()); let parser_fut = req_fut.and_then(|res| { ParserFuture::new(res.into_body().map_err(|_| ()), rcdom::RcDom::default()) }); let nodes = parser_fut.and_then(|dom| { NodeStream::new(&dom).collect() }); let print_fut = nodes.and_then(|vn| { println!("found {} elements", vn.len()); Ok(()) }); core.run(print_fut).unwrap(); } ``` ### Using Stable Reqwest 0.8.6 ```rust extern crate html5ever; extern crate html5ever_stream; extern crate reqwest; use html5ever::rcdom; use html5ever_stream::{ParserSink, NodeIter}; fn main() { let mut resp = reqwest::get("https://github.com").unwrap(); let mut parser = ParserSink::new(rcdom::RcDom::default()); resp.copy_to(&mut parser).unwrap(); let document = parser.finish(); let nodes: Vec = NodeIter::new(&document).collect(); println!("found {} elements", nodes.len()); } ``` ## License Licensed under the [MIT License](http://opensource.org/licenses/MIT)