# Querying the HackerNews API with `trustfall` This is a demo showing how to plug in a real-world API into `trustfall` and execute queries against it. The key idea demonstrated in this demo is **composition in schemas and queries**. Even without using `trustfall` capabilities, a motivated programmer could write a purpose-built tool that directly uses the HackerNews API to perform the operations examined here. However, each such tool would need to be written, maintained, and optimized separately. Tools that do related-but-different operations, especially if implemented by different people or at different points in time, are unlikely to share code. For tools performing queries of significant complexity, the large state space means the tool is also unlikely to be thoroughly tested, and may be buggy and difficult to extend. In contrast, querying using the `trustfall` engine allows all functionality to be decomposed into smaller components (vertices, edges, vertex properties), each able to be implemented and tested independently of others. Each component represents one conceptual operation, for example the "get a `User` vertex's `name`" property operation, or the "for a `Comment` vertex, find the `User` that authored that comment" edge operation. Each such component can be added to the schema one at a time, and implemented and tested individually. It's much easier to implement and test multiple small components, than to build and test all possible compositions. If each component is correctly implemented, the `trustfall` query engine guarantees that the composition of such components is also going to be correct. This makes it possible to confidently expose larger schemas and execute more complex queries than previously possible, without worrying about bugs. Similarly, the composition-based implementation approach allows individual operations to be optimized as necessary, making it easier to win performance gains above and beyond the very good performance already provided by the iterator-style execution model. ## Table of Contents - [Components](#components) - [Installing Rust](#installing-rust) - [Running queries](#running-queries) - [Query syntax primer](#query-syntax-primer) - [Examples](#examples) - [Example: Front page stories with links](#example-front-page-stories-with-links) - [Example: Jobs in the top 50 items](#example-jobs-in-the-top-50-items) - [Example: Latest links submitted by high-karma users](#example-latest-links-submitted-by-high-karma-users) - [Example: Latest links with high-karma commenters](#example-latest-links-with-high-karma-commenters) - [Writing your own queries](#writing-your-own-queries) ## Components The project consists of the following components: - `vertex.rs` defines the `Vertex` enum which `trustfall` uses to represent vertices in the query graph. - `adapter.rs` defines the `HackerNewsAdapter` struct, which implements the `trustfall::provider::BasicAdapter` trait and connects the query engine to the HackerNews API. - The `resolve_starting_vertices` method is what produces the initial iterator of `Vertex` vertices corresponding to the root edge at which querying starts (e.g. `FrontPage`). - The `resolve_property` method is used to get property values for each `Vertex` in an iterator. - The `resolve_neighbors` method is used to get the neighboring vertices (`Vertex`s) across a particular edge, for each `Vertex` in an iterator. - The `resolve_coercion` method is kind of like the Python `isinstance()` function: for each `Vertex` in an iterable, it checks whether that `Vertex`'s type can be narrowed to a more derived type than it previously represented. For example, if the `Vertex` originally represented `interface Animal`, `resolve_coercion` may be used to check whether the `Vertex` is actually of `type Dog implements Animal`. - `main.rs` is a simple CLI app that can execute query files in `ron` format. ## Installing Rust This demo requires the Rust toolchain. To install Rust, follow the [official instructions](https://www.rust-lang.org/tools/install). On UNIX-like operating systems, that's usually as simple as `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`. To confirm everything is set up correctly, `cd` into this directory and run `cargo check`. After compiling for a minute or two, it should find no errors. ## Running queries This demo contains several example query files in the `example_queries` directory. Each file represents a single query (conforming to the schema in `hackernews.graphql`) together with any arguments necessary to run the query. To execute a query, run `cargo run --example hackernews query path/to/query/file.ron`. The execution is lazy and incremental (iterator-style), so you'll see results stream onto the screen continuously as they are received. **Reminder**: While the schema and query syntax here is able to be parsed with GraphQL, you are not actually using a GraphQL API, and the query semantics are *not* the same as GraphQL. GraphQL's capabilities are a strict subset of the abilities of the query capabilities of `trustfall`. ### Query syntax primer Here are the most notable aspects in which the `trustfall` query language differs from GraphQL: - `@output` is used to mark fields for output; in SQL, this would correspond to a term in the `SELECT` clause. - `@filter` denotes that a field must match a predicate; in SQL, this would correspond to a term in the `WHERE` clause. - Expanding an edge by default has semantics equivalent to SQL `INNER JOIN`. Directives like `@optional / @recurse / @fold` may be applied to edges to change their behavior. - When an edge points to an `interface` type, it is possible to select only a specific subtype (`interface` or `type`) of that `interface` using a type coercion: `... on Foo` to select only `Foo`-typed vertices and discard all others. - `@optional` denotes that an edge is optional and is allowed to not exist. This is semantically equivalent to a SQL `LEFT JOIN`. - `@recurse` denotes that an edge is to be traversed recursively between 0 and the number of times specified in the directive's `depth` field. - `@fold` requests that the data output on the other side of the edge be "folded" into lists for each output field. In PostgreSQL terminology, this is like a `GROUP BY` with `array_agg()` applied to all folded outputs. ## Examples Let's describe and explain the queries in the `example_queries` directory. ### Example: Front page stories with links `cargo run --example hackernews query example_queries/front_page_stories_with_links.ron` gets the HackerNews items on the front page that are stories with links (as opposed to job links, or submissions like "Show HN" that contain a message instead of a link). For each match, the query outputs its title, link, current score, the name of its submitter and their current karma. This is what the query looks like: ```graphql { FrontPage { ... on Story { title @output url @filter(op: "is_not_null") @output score @output byUser { submitter: id @output submitter_karma: karma @output } } } } ``` Here's what running it looks like: ``` $ cargo run --example hackernews query example_queries/front_page_stories_with_links.ron Finished dev [unoptimized + debuginfo] target(s) in 0.15s Running `/.../hackernews query example_queries/front_page_stories_with_links.ron` { "submitter_karma": 13731, "submitter": "0xedb", "title": "New Year, New CEO", "url": "https://signal.org/blog/new-year-new-ceo/", "score": 474 } { "submitter": "danso", "score": 71, "title": "In first, US surgeons transplant pig heart into human patient", "submitter_karma": 142723, "url": "https://apnews.com/article/pig-heart-transplant-6651614cb9d73bada8eea2ecb6449aef" } <... many more results ...> ``` ### Example: Jobs in the top 50 items `cargo run query example_queries/jobs_in_top_50.ron` gets the jobs that are currently shown in the top 50 items on HackerNews. For each match, it returns the posting's title, url, and current score. This is the query: ```graphql { Top(max: 50) { ... on Job { title @output url @filter(op: "is_not_null") @output score @output } } } ``` Here's its output: ``` $ cargo run --example hackernews query example_queries/jobs_in_top_50.ron Finished dev [unoptimized + debuginfo] target(s) in 0.14s Running `/.../hackernews query example_queries/jobs_in_top_50.ron` { "title": "Flow Club (YC S21) is hiring our first marketer", "score": 1, "url": "https://flowclub.notion.site/Work-at-Flow-Club-1e6cc84bfc0d4463ab333ee9bc02c46a" } ``` ### Example: Latest links submitted by high-karma users `cargo run --example hackernews query example_queries/latest_links_by_high_karma_users.ron` gets the latest links (i.e. links on the "new" tab) that were submitted by users with karma of 10,000 or more. For each match, it returns the submission's title, URL, current score, and the submitter's username and current karma. This is the query: ```graphql { LatestStory(max: 100) { title @output url @filter(op: "is_not_null") @output score @output byUser { submitter: id @output submitter_karma: karma @filter(op: ">=", value: ["$min_karma"]) @output } } } ``` It is executed with the following arguments, shown here in RON serialization format: ``` { "min_karma": Uint64(10000), } ``` Here's what running it looks like: ``` $ cargo run --example hackernews query example_queries/latest_links_by_high_karma_users.ron Finished dev [unoptimized + debuginfo] target(s) in 0.15s Running `/.../hackernews query example_queries/latest_links_by_high_karma_users.ron` { "submitter_karma": 23927, "score": 2, "url": "https://github.com/snapview/sunrise", "title": "Sunrise: Spreadsheet-like dataflow programming in TypeScript", "submitter": "wslh" } { "title": "The case for Rust as the future of JavaScript infrastructure", "submitter_karma": 30051, "score": 1, "submitter": "feross", "url": "https://thenewstack.io/the-case-for-rust-as-the-future-of-javascript-infrastructure/" } <... many more results ...> ``` ### Example: Latest links with high-karma commenters `cargo run --example hackernews query example_queries/links_with_high_karma_commenters.ron` looks at the latest 100 story submissions (i.e. HN's "new" tab items), and selects those that have links and also have comments (looking up to 5 reply levels deep) made by users with at least 10,000 karma. For each match, it outputs the submission's title, current score, URL, as well as the matching comment's content, author, and the author's current karma. This is the query: ```graphql { LatestStory(max: 100) { title @output url @filter(op: "is_not_null") @output score @output comment { reply @recurse(depth: 5) { comment: text @output byUser { commenter: id @output commenter_karma: karma @filter(op: ">=", value: ["$min_karma"]) @output } } } } } ``` It is executed with the following arguments, shown here in RON serialization format: ``` { "min_karma": Uint64(10000), } ``` Here's what running it looks like: ``` $ cargo run --example hackernews example_queries/links_with_high_karma_commenters.ron Finished dev [unoptimized + debuginfo] target(s) in 0.17s Running `/.../hackernews query example_queries/links_with_high_karma_commenters.ron` { "commenter_karma": 22774, "url": "https://www.phoronix.com/scan.php?page=news_item&px=Intel-New-CCG-Leader", "comment": ">Holthaus replaces EVP Gregory Bryant (“GB”), who will leave the company at the end of January for a new opportunity.
This is strange because Gregory Bryant was still presenting at CES [1] .
[1] https://www.anandtech.com/show/17171/intel-keynote-and-svp-g...", "title": "Intel Announces New Leader of Client Computing Group", "commenter": "ksec", "score": 1 } { "commenter": "scrollaway", "commenter_karma": 25466, "comment": "4 petabytes eh. Bonus points for the first article in several years to use CD-ROMs as a unit of comparison.
> "drawn from crime reports, hacked from encrypted phone services and sampled from asylum seekers never involved in any crime"",
"url": "https://www.theguardian.com/world/2022/jan/10/a-data-black-hole-europol-ordered-to-delete-vast-store-of-personal-data",
"title": "A data ‘black hole’: Europol ordered to delete vast store of personal data",
"score": 31
}
<... many more results ...>
```
## Writing your own queries
The easiest way to write and run your own query is to:
- copy the content of one of the example queries,
- edit the query string and/or arguments as necessary,
- save it to a new file,
- then run it with `cargo run --example hackernews query