Inkjet
A batteries-included syntax highlighting library for Rust, based on tree-sitter
.
## Features
- Language grammars are linked into the executable as C functions - no need to load anything at runtime!
- Pluggable formatters. Inkjet includes a formatter for HTML, and writing your own is easy.
- Support for [Helix editor themes](https://docs.helix-editor.com/themes.html#modifiers), including a large collection of vendored themes to get you started.
- Highlight into a new `String` or a `std::io::Write`/`std::fmt::Write`, depending on your use case.
- Specify languages explicitly (from an `enum`) or look them up using a token like `"rs"` or `"rust"`.
- ~~Extremely cursed `build.rs`~~
## Included Languages
Inkjet comes bundled with support for over seventy languages, and it's easy to add more - see the FAQ section.
Click to expand...
| Name | Recognized Tokens |
| ---- | ------- |
| Ada | `ada` |
| Assembly (generic) | `asm` |
| Awk | `awk` |
| Bash | `bash`, `sh`, `shell` |
| BibTeX | `bibtex`, `bib` |
| Bicep | `bicep` |
| Blueprint | `blueprint`, `blp` |
| C | `c`, `h` |
| Cap'N Proto | `capnp` |
| Clojure | `clojure`, `clj`, `cljc` |
| C# | `c_sharp`, `c#`, `csharp`, `cs` |
| C++ | `c++`, `cpp`, `hpp`, `h++`, `cc`, `hh` |
| CSS | `css` |
| Cue | `cue` |
| D | `d`, `dlang` |
| Dart | `dart` |
| Diff | `diff` |
| Dockerfile | `dockerfile`, `docker` |
| EEx | `eex` |
| Emacs Lisp | `elisp`, `emacs-lisp`, `el` |
| Elixir | `ex`, `exs`, `leex` |
| Elm | `elm` |
| Erlang | `erl`, `hrl`, `es`, `escript` |
| Forth | `forth`, `fth` |
| Fortran | `fortran`, `for` |
| Fish | `fish` |
| GDScript | `gdscript`, `gd` |
| Gleam | `gleam` |
| GLSL | `glsl` |
| Go | `go`, `golang` |
| Haskell | `haskell`, `hs` |
| HCL | `hcl`, `terraform` |
| HEEx | `heex` |
| HTML | `html`, `htm` |
| INI | `ini` |
| JavaScript | `javascript`, `js` |
| JSON | `json` |
| JSX | `jsx` |
| Julia | `julia`, `jl` |
| Kotlin | `kotlin`, `kt`, `kts` |
| LaTeX | `latex`, `tex` |
| LLVM | `llvm` |
| Lua | `lua` |
| GNU Make | `make`, `makefile`, `mk` |
| MatLab | `matlab`, `m` |
| Meson | `meson` |
| Nix | `nix` |
| Objective C | `objective_c`, `objc` |
| OCaml | `ocaml`, `ml` |
| OCaml Interface | `ocaml_interface`, `mli` |
| OpenSCAD | `openscad`, `scad` |
| Pascal | `pascal` |
| PHP | `php` |
| ProtoBuf | `protobuf`, `proto` |
| Python | `python`, `py` |
| R | `r` |
| Racket | `racket`, `rkt` |
| Regex | `regex` |
| Ruby | `ruby`, `rb` |
| Rust | `rust`, `rs` |
| Scala | `scala` |
| Scheme | `scheme`, `scm`, `ss` |
| SCSS | `scss` |
| SQL (Generic) | `sql` |
| Swift | `swift` |
| TOML | `toml` |
| TypeScript | `typescript`, `ts` |
| TSX | `tsx` |
| Vimscript | `vimscript`, `vim` |
| WAST (WebAssembly Script) | `wast` |
| WAT (WebAssembly Text) | `wat`, `wasm` |
| x86 Assembly | `x86asm`, `x86` |
| WGSL | `wgsl` |
| YAML | `yaml` |
| Zig | `zig` |
In addition to these languages, Inkjet also offers the [`Runtime`](https://docs.rs/inkjet/latest/inkjet/enum.Language.html#variant.Runtime) and [`Plaintext`](https://docs.rs/inkjet/latest/inkjet/enum.Language.html#variant.Plaintext) languages.
- `Runtime` wraps a `fn() -> &'static HighlightConfiguration` pointer, which is used to resolve the language at (you guessed it) runtime.
- `Plaintext` enables cheap no-op highlighting. It loads the `diff` grammar under the hood, but provides no highlighting queries. It's aliased to `none` and `nolang`.
## Cargo Features
- (Default) `html` - enables the bundled HTML formatter, which depends on `v_htmlescape`.
- (Default) `theme` - enables the theme API, which depends on `ahash`, `toml` and `serde`.
- (Default) `all-languages` - enables all languages.
- `language-{name}` - enables the specified language.
- If you want to only enable a subset of the included languages, you'll have to set `default-features=false` and manually re-add each language you want to use.
- `terminal` - enables the `termcolor`-based terminal formatter, which depends on the `theme` feature.
## FAQ
### *"Why is Inkjet so large?"*
Parser sources generated by `tree-sitter` can grow quite big, with some being dozens of megabytes in size. Inkjet has to bundle these sources for all the languages it supports, so it adds up. (According to `loc`, there are over 23 *million* lines of C code!)
If you need to minimize your binary size, consider disabling languages that you don't need. Link-time optimization can also shave off a few megabytes.
### *"Why is Inkjet taking so long to build?"*
Because it has to compile and link in dozens of C/C++ programs (the parsers and scanners for every language Inkjet bundles.)
However, after the first build, these artifacts will be cached and subsequent builds should be much faster.
### *"Why does highlighting require a mutable reference to the highlighter?*
Under the hood, Inkjet creates a `tree-sitter` highlighter/parser object, which in turn dynamically allocates a chunk of working memory. Using the same highlighter for multiple simultaneous jobs would therefore cause all sorts of nasty UB.
If you want to highlight in parallel, you'll have to create a clone of the highlighter for each thread. I recommend [`thread_local!`](https://doc.rust-lang.org/std/macro.thread_local.html) and `RefCell` if you need a quick and easy solution.
### *"A language I want to highlight isn't bundled with Inkjet!"*
Assuming that you or someone else has implemented a highlighting-ready `tree-sitter` grammar for the language you want, adding it to Inkjet is easy! Just open an issue asking for it to be added, linking to the grammar repository for the language.
Alternatively, you can use [`Language::Runtime`](https://docs.rs/inkjet/latest/inkjet/enum.Language.html#variant.Runtime), which will allow you to use grammars not bundled with Inkjet.
Other notes:
- Inkjet currently only supports grammar repositories that check in the parser generated by `tree-sitter` (in order to avoid a build-time dependency on `node`/`npm`.)
- Inkjet requires that the grammar include (at minimum) a `highlights.scm` query targeted at the base `tree-sitter` library. Extended queries (such as those from `nvim-treesitter`) will not work.
- I will not support blockchain/smart contract languages like Solidity. Please take your scam enablers elsewhere.
## Building
For normal use, Inkjet will compile automatically just like any other crate.
However, if you have forked the repository and want to update the bundled languages, you'll need to use GNU Make with the included `Makefile`:
- `make redownload` will wipe the `languages/` directory and redownload everything from scratch.
- Currently, this only works on *nix. You will need `git`, `sed` and `wget` installed. (Git clones the grammar repositories, while `sed` and `wget` are used in miniature setup scripts for some languages.)
- `make regenerate` will wipe `src/languages.rs` and regenerate it from scratch.
- `make features` will generate a file called `features` in the crate root, containing all the individual language features (ready to be pasted into `Cargo.toml`.)
- `make themes` will regenerate the `mod.rs` file in `src/theme/vendored` using the contents of the `data/` directory.
If, for whatever reason, you don't have GNU Make available: you can also perform these actions manually by setting the appropriate environment variables and Cargo flags:
- `INKJET_REDOWNLOAD_LANGS=true` for `make redownload`.
- `INKJET_REBUILD_LANGS_MODULE=true` for `make regenerate`.
- `INKJET_REBUILD_FEATURES=true` for `make features`.
- `INKJET_REBUILD_THEMES=true` for `make themes`.
Run `cargo build --all-features` with these set. (The development portions of the build script are feature gated by default.)
## Acknowledgements
- Inkjet would not be possible without `tree-sitter` and the ecosystem of grammars surrounding it.
- Many languages are only supported thanks to the highlighting queries created by the [Helix](https://github.com/helix-editor/helix) project.