| Crates.io | type-sitter |
| lib.rs | type-sitter |
| version | 0.8.0 |
| created_at | 2024-10-18 15:53:28.460405+00 |
| updated_at | 2025-08-12 22:26:32.533523+00 |
| description | generate typed wrappers for tree-sitter grammars from node-types.json and queries |
| homepage | |
| repository | https://github.com/Jakobeha/type-sitter/ |
| max_upload_size | |
| id | 1414416 |
| size | 59,017 |
Note: type-sitter is in alpha, therefore the API is subject to change.
Type-sitter currently depends on **tree-sitter v0.25 **.
type-sitter generates type-safe wrappers for tree-sitter nodes and queries in a specific language. Nodes are generated
from node-types.json, and queries
from query s-expressions.
"Type-safe" here means that:
tree_sitter::Node, each node type has its own
data-type which wraps tree_sitter::Node.
enums, so you can pattern-match their subtypes with compile-time exhaustiveness checking.type_sitter::Node. You can use generics
and convert to/from
type_sitter::UntypedNode to
write methods that take or return arbitrary-typed nodes.field("field_name"), you access by specific methods like field_name().
capture("capture_name"),
and query methods return typed nodes.type-sitter has other useful features:
Option<NodeResult<'_>>.unwrap2(), .expect2(), and
.flatten().Lastly, there's an optional feature, yak-sitter, which re-exports the tree-sitter API with a few small changes, most
notably nodes being able to access their text and filepath directly. The yak-sitter library is
a drop-in replacement for tree-sitter and can by used by itself without type-sitter (and yak-sitter is optional in
type-sitter).
There are three ways to use type-sitter: procedural macros, build script, or the CLI tool. Procedural macros is the
easiest. Build script is recommended because it's much faster (only runs when the grammar changes) and lets you see the
generated code. The CLI tool is the most flexible, as it lets you edit the generated code, but it requires you to
re-generate the code manually.
Every method except the build script for node-types only requires that you vendor the tree-sitter grammar you want
to generate bindings for: you cannot just include it as a dependency in Cargo.toml, because the node generator needs a
hard-coded (relative) path to the grammar's node-types.json, and the query generator needs a hard-coded path to the
grammar's root folder (containing src/node-types.json), which must also contain a built shared object (at
build/tree_sitter_foobar_binding.dylib or build/tree_sitter_foobar_binding.so).
cargo add type-sitter # Or add to Cargo.toml manually
cargo add tree-sitter-foobar-lang # Replace `foobar-lang` with the name of your language
To generate typed nodes:
// Assume this code is in `src/foobar_nodes.rs`
use type_sitter_proc::generate_nodes;
generate_nodes! {
// Replace this with the path to the node-types.json file
"vendor/path/to/tree-sitter-foobar-lang/src/node-types.json"
}
To generate typed queries:
// Assume this code is in `src/foobar_queries.rs`
use type_sitter_proc::generate_queries;
generate_queries! {
// Replace this with the path to the queries folder
"vendor/path/to/tree-sitter-foobar-lang/src/queries",
// Replace this with the path to the grammar's root
"vendor/path/to/tree-sitter-foobar-lang/src",
// Replace with a different path if the nodes don't exist in a sibling module named `foobar_nodes`.
super::foobar_nodes,
}
cargo add type-sitter --no-default-features # Or add to Cargo.toml manually
cargo add --build type-sitter-gen # Notice `cargo add --build`
cargo add tree-sitter-foobar-lang # Replace `foobar-lang` with the name of your language
Then, in build.rs
use std::path::{PathBuf, Path};
use std::{env, fs};
use type_sitter_gen::{generate_nodes, generate_queries, super_nodes};
fn main() {
// Common setup
let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());
println!("cargo::rerun-if-changed=build.rs");
// Obligatory: in this and future lines, replace `vendor/path/to/tree-sitter-foobar-lang`
// with the path to your grammar's folder, relative to the folder containing `Cargo.toml`
println!("cargo::rerun-if-changed=vendor/path/to/tree-sitter-foobar-lang");
// To generate nodes
let path = Path::new("vendor/path/to/tree-sitter-foobar-lang/src/node-types.json");
fs::write(
out_dir.join("nodes.rs"),
generate_nodes(path).unwrap().into_string()
).unwrap();
// To generate queries
fs::write(
out_dir.join("queries.rs"),
generate_queries(
"vendor/path/to/tree-sitter-foobar-lang/queries",
"vendor/path/to/tree-sitter-foobar-lang",
// Replace with a different `syn::Path` if the nodes don't exist in a subling to `dest_path` named `nodes`
&super_nodes(),
// Replace with `true` if you are using the `yak-sitter` feature (by default, no)
false
).unwrap().into_string()
).unwrap();
}
then make sure to include the generated code somewhere:
mod nodes {
include!(concat!(env!("OUT_DIR"), "/nodes.rs"));
}
mod queries {
include!(concat!(env!("OUT_DIR"), "/queries.rs"));
}
To generate custom supertypes, follow the same steps as above, but modify the build script to something like
use type_sitter_gen::{NodeTypeMap, NodeName, NodeTypeKind};
fn main() {
// ...
// To generate nodes (THIS SECTION IS DIFFERENT)
let path = Path::new("vendor/path/to/tree-sitter-foobar-lang/src/node-types.json");
let node_type_map = NodeTypeMap::try_from(path).unwrap();
let named: Vec<NodeName> = node_type_map
.values()
.map(|node| node.name.clone())
.filter(|name| name.is_named);
node_type_map
.add_custom_supertype("_all_named", named)
.expect("this mustn't already exist");
// To give an explicit name to a hidden node that is not a supertype in the grammar.
// (e.g. make `Class::members` return `ClassMember` instead of `anon_unions::...`, assuming the
// original `grammar.js` contains:
// ```
// class: $ => seq(
// ...
// field('members', $._class_members),
// ...
// ),
// _class_members: $ => choice(...),
// ```
// and `_class_members` is not in `supertypes`)
let class_member_variants = node_type_map["class"]["members"].types.clone();
node_type_map
.add_custom_supertype("_class_member", class_member_variants)
.expect("this mustn't already exist");
fs::write(
out_dir.join("nodes.rs"),
generate_nodes(node_type_map).unwrap().into_string()
).unwrap();
// ...
}
Run these commands or add the dependencies manually:
cargo add type-sitter --no-default-features # Or add to Cargo.toml manually
cargo add --build type-sitter-gen # Notice `cargo add --build`
cargo add tree-sitter-foobar-lang # Replace `foobar-lang` with the name of your language
# Since the grammar isn't vendored, you must also include your language's tree-sitter grammar as a build-dependency.
cargo add --build tree-sitter-foobar-lang # Replace `foobar-lang` with the name of your language
Then, in build.rs
use std::path::PathBuf;
use std::{env, fs};
use type_sitter_gen::generate_nodes;
fn main() {
// Common setup. Same as before
let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());
println!("cargo::rerun-if-changed=build.rs");
// Obligatory: in this and future lines, replace `vendor/path/to/tree-sitter-foobar-lang`
// with the path to your grammar's folder, relative to the folder containing `Cargo.toml`
println!("cargo::rerun-if-changed=vendor/path/to/tree-sitter-foobar-lang");
// To generate nodes
fs::write(
out_dir.join("nodes.rs"),
generate_nodes(tree_sitter_foobar_lang::NODE_TYPES).unwrap().into_string()
).unwrap();
}
then make sure to include the generated code somewhere:
mod nodes {
include!(concat!(env!("OUT_DIR"), "/nodes.rs"));
}
Currently you can't generate queries without vendoring the grammar.
cargo add type-sitter --no-default-features # Or add to Cargo.toml manually
cargo install type-sitter-cli # Notice `cargo install`
cargo add tree-sitter-foobar-lang # Replace `foobar-lang` with the name of your language
Then, manually generate typed nodes and queries with the CLI tool:
# Replace `vendor/path/to/tree-sitter-foobar-lang` and `src/parent/of/generated/module` with the path to the grammar's
# root folder (containing `src/node-types.json` and `queries`) and the directory where you want the generated module's
# source files to be placed, respectively.
> cargo run -p type-sitter-cli vendor/path/to/tree-sitter-foobar-lang -o src/parent/of/generated/module
Additionally, you must pass --use-yak-sitter if the yak-sitter feature is enabled. If you skip -o, it defaults to
src/type_sitter.
Alternatively, instead of the path to the grammar's root folder, if you specify the path to the node-types.json
directly, the CLI tool will only generate node types; or if you specify the path to the queries directory, it will
only generate queries.
A downside with the CLI approach is that you need to manually re-generate the nodes if the grammar changes. An upside is that, if you know the grammar won't change and you won't have to manually re-generate, you can edit the generated code and the edits will persist.
Another downside is that the CLI can only be used on systems that have run cargo install type-sitter-cli.
See https://github.com/rust-lang/cargo/issues/2267 for why the CLI method can't easily be made portable; if you want
portability, use procedural macros or a build script.
pub fn get_import_paths_untyped<'a>(source: &'a str, tree: &tree_sitter::Tree) -> Vec<&'a str> {
// BAD: what if we spell the field names wrong? What if a new variant is added with the same field name?
tree.root_node().children(&mut tree.walk())
.filter(|n| n.kind() == "use_declaration")
.filter_map(|n| n.child_by_field_name("argument"))
.filter_map(|n| n.child_by_field_name("path"))
.map(|n| n.utf8_text(source.as_bytes()).unwrap())
.collect()
}
pub fn get_import_paths_typed<'a>(source: &'a str, tree: &type_sitter::Tree<rust::SourceFile<'static>>) -> Vec<&'a str> {
// GOOD: fields are type-safe, variant selectors are explicit, and we get IDE inference
tree.root_node().unwrap().children(&mut tree.walk())
.filter_map(|n| n.as_use_declaration())
.filter_map(|n| n.argument().map(|r| r.unwrap()))
.filter_map(|n| n.as_scoped_identifier())
.filter_map(|n| n.path().map(|r| r.unwrap()))
.map(|n| n.utf8_text(source.as_bytes()).unwrap())
.collect()
}
// We can also define methods which only take nodes of certain types
pub fn process_declaration(decl: rust::DeclarationStatement<'_>) {
// ...
}
Be aware that the generated wrapper code is very large: the generated node wrappers for
tree-sitter-rust are >30000 LOC,
and queries are >6000 LOC. I don't know how that impacts compilation or
analysis speed.
type-sitter-proc is particularly slow because it must re-generate this code every build. type-sitter-gen or
type-sitter-cli can be configured to only re-generate when the tree-sitter grammar changes.
type-sitter generates data-types based on the names of the nodes in the grammar. However, these nodes are in
snake-case and contain punctuation which is illegal in Rust, so we convert them to camel-case and perform the following
illegal-character substitutions:
& ⇒ And| ⇒ Or! ⇒ Not= ⇒ Eq< ⇒ Lt> ⇒ Gt+ ⇒ Add- ⇒ Sub* ⇒ Mul/ ⇒ Div~ ⇒ BitNot% ⇒ Mod^ ⇒ BitXor? ⇒ Question: ⇒ Colon. ⇒ Dot, ⇒ Comma; ⇒ Semicolon( ⇒ LParen) ⇒ RParen[ ⇒ LBracket] ⇒ RBracket{ ⇒ LBrace} ⇒ RBrace\ ⇒ Backslash' ⇒ Quote" ⇒ DoubleQuote# ⇒ Hash@ ⇒ At$ ⇒ Dollar` ⇒ Backtick ⇒ Space\t ⇒ Tab\n ⇒ Newline\r ⇒ CarriageReturnU + the character's Unicode codepoint in upper-hex.For method names (variant selectors), we simply convert back to snake case.
Additionally, if a node is implicit (starts with _), we remove the prepended _.
Next, if a type or method name would start with a digit, type-sitter prepends a _. If the type or method name would
be _, type-sitter uses __. If the type or method name would be a reserved identifier that can be raw,
type-sitter prepends r#. And, if the type or method name would be a reserved identifier that can't be raw (Self,
self, super, crate), type-sitter appends _.
Lastly, if there are ever multiple types with the same name in the same module, or methods or variants with the same
name in the same type, type-sitter appends _ to the later one until it's unique. For example, if there are two unnamed
nodes Fn and fn, one of them will have type Fn, and the other will have type Fn_. You can see which node is
which by looking at the documentation, which contains the original tree-sitter name. The disambiguation is guaranteed to
be deterministic.
Naming rules also determine the module. Unnamed nodes and symbols are in modules specifically to reduce naming conflicts without having to disambiguate the nodes as described above.
symbol::.unnamed::.The source for all this is
type-sitter-gen/src/node_types/rust_names.rs.
_declaration_statement ⇒ DeclarationStatementuse_declaration ⇒ UseDeclarationself ⇒ unnamed::Self_% ⇒ symbols::Modmod ⇒ unnamed::Modtrue selector ⇒ r#true (true ⇒ unnamed::True)Query capture naming rules are the exact same as node rules, except that in captures, . is interpreted as _ when
converting to camel-case (e.g. method.definition => MethodDefinition and method_definition).
rust-sitter is the primary alternative which also provides convenience over tree-sitter's Rust API. However, rust-sitter takes a much different approach by fully generating the tree-sitter grammar from a Rust file.
Advantages of type-sitter:
tree-sitter nodesyak-sitter feature it only provides typed
wrappers for nodes (and even yak-sitter isn't much different)Advantages of rust-sitter:
Feel free to submit an issue or pull request if you want a new feature or anything is missing, and don't hesitate to submit an issue if you encounter any bugs or have any questions.
The code is licensed under MIT or Apache 2.0 (you choose), which is the norm for Rust packages.