# Natural Language Syntax Highlighting
Natural-Syntax-LS is a language server that highlights different parts of
speech (POS) in plain text.
## Installation
1. Download `libtorch` v2.1 as per
[Rust-BERT's documentation][download-torch].
Tips.
You can figure out the URL to download `libtorch` [in tch-rs' build
script](https://github.com/LaurentMazare/tch-rs/blob/5480d6fd4be12e748e0d87555db54a5f6e74edf2/torch-sys/build.rs#L311).
The `LIBTORCH` variable should be the `torch/` directory.
Why automatic installation does not work.
Rust-BERT has an "automatic installation" option that
uses tch-rs' build script to download `libtorch`.
However,
the binary produced this way does not run because that `libtorch` is not on
`LD_LIBRARY_PATH`.
Alternatively, you could statically link `libtorch`,
but that would
[require you to download `libtorch` yourself][tch-static-linking] anyway.
1. Install the `natural_syntax_ls` package with Cargo or friends to
get the `natural-syntax-ls` binary:
```sh
cargo install natural_syntax_ls --default-features=false
```
Setting the `default-features` to
`false` disables downloading `libtorch` (automatic installation).
Why automatic installation is the default.
Because otherwise it would be a pain to run the continuous integration.
## Editor setup
### ✅ NeoVim setup with LSPConfig
Please paste the below `natural_syntax_ls_setup` function in
your Nvim configuration and call it with your client's `capabilities`.
[Please see my config for an
example](https://github.com/SichangHe/.config/blob/b0961205a060d3588f56e97fd066a35424fe64a9/nvim/lua/plugins/lsp.lua#L301).
The natural_syntax_ls_setup
function.
```lua
local function natural_syntax_ls_setup(capabilities)
local lspconfig = require('lspconfig')
require('lspconfig.configs')['natural_syntax_ls'] = {
default_config = {
cmd = { 'natural-syntax-ls' },
filetypes = { 'text' },
single_file_support = true,
},
docs = {
description = [[The Natural Syntax Language Server for highlighting parts of speech.]],
},
}
lspconfig['natural_syntax_ls'].setup {
capabilities,
init_options = {
token_map_update = {
-- Customize your POS-token mapping here. E.g.:
--[[
-- Disable coordinating conjunctions highlighting.
CC = vim.NIL, -- `nil` does not work because it gets ignored.
-- Highlight wh-determiners as enum members without any modifiers.
WDT = { type = "enumMember" },
-- Highlight determiners as read-only classes.
DT = { type = "class", modifiers = { "readonly" } },
]]
},
},
}
end
```
Customizations:
- I only set the `filetypes` field to `text`,
but you can enable natural-syntax-ls for any other file types as well.
Note that, though,
the language server's semantic tokens supersede Tree-sitter highlighting by
default.
- By specifying the `token_map_update` field in `init_options`,
you can customize the mapping between parts of speech and semantic tokens.
- The default mapping is in the `pos2token_bits` function in
[`semantic_tokens.rs`][semantic_tokens.rs].
- Part of speech tags are the variants of the `PartOfSpeech` enum in
[`lib.rs`](https://github.com/SichangHe/natural_syntax/blob/main/src/lib.rs).
- Token types and modifiers are variants of `TokenType` and
`TokenModifier` in [`semantic_tokens.rs`][semantic_tokens.rs],
all in camelCase.
### ❓ Visual Studio Code and other editor setup
No official support, but community plugins are welcome.
I do not currently use VSCode and these other editors,
so I do not wish to maintain plugins for them.
However,
it should be straightforward to implement plugins for them since
Natural-Syntax-LS implements the Language Server Protocol (LSP).
So,
please feel free to make a plugin yourself and create an issue for me to
link it here.
## Selected specification
### Prediction Scheduling
For a single document, only one prediction is scheduled at a time.
When a prediction is ongoing,
new updates are queued and
the latest update replaces any previous updates queued.
## Debugging
We use `tracing-subscriber` with the `env-filter` feature to
emit logs[^tracing-env-filter].
Please configure the log level by setting the `RUST_LOG` environment variable.
On macOS, you may need to set `DYLD_LIBRARY_PATH` to run the tests.
## Future work
- [ ] Customizing the mapping between part of speech and semantic token.
- [ ] Support languages other than English. This simply requires a new model.
- [ ] Incremental updates and semantic token ranges.
- [ ] Do not overwrite Markdown/LaTeX syntax highlighting.
[^tracing-env-filter]:
[download-torch]: https://docs.rs/rust-bert/0.22.0/rust_bert/#manual-installation-recommended
[semantic_tokens.rs]: https://github.com/SichangHe/natural_syntax/blob/main/natural_syntax_ls/src/semantic_tokens.rs
[tch-static-linking]: https://github.com/LaurentMazare/tch-rs/tree/v2.1?tab=readme-ov-file#static-linking