Tree-sitter adventures: Language injections
Posted on Mon 18 December 2023 in Neovim
I'm currently working on a personal fun project that tries to make this code valid Rust:
#[sql]
fn get_subjects() -> Vec<i32> {
"select id from subjects"
}
In case you don't know Rust: #[sql]
is a macro that rewrites the function to
make it legal code. My goal is to be able to write inline SQL queries as
convenient and with as little boilerplate as possible. But the point of this
article is that the above code snippet looks like this in my editor:
So the SQL code inside the Rust string is highlighted properly. That's cool, but it surprised me even more that it still worked while writing this article in Markdown:
SQL inside Rust inside Markdown and still everything is properly highlighted.
How does this work?
This is possible due to the
Tree-sitter integration in
Neovim. More or less everything you have to know was
already explained in of TJs
videos. Well, not everything:
The Tree-sitter integration is rather new, some features required plugins
in the past, but will be integrated in 0.10, APIs changed, ... Long story
short: I'll walk you through my steps required to make this working in
Neovim v0.10.0-dev-1836+g36552adb3
.
Tree-sitter queries
Tree-sitter allows you to write queries to pick parts of the code in your
editor. Just open a source file in Neovim and execute :InspectTree
(remember,
you need 0.10 for all of this to work). You get the parsing tree of
Tree-sitter. Just move the cursor around in any of both windows and observe
what is highlighted in the other one. If you have basic knowledge how language
parsing and ASTs work, you should get the idea.
The query I came up with for the above example is:
;extends
(
(attribute_item
(attribute
(identifier) @ident (#any-of? @ident "sql")
)
)
.
(function_item
body: (block (string_literal) @injection.content)
)
(#offset! @injection.content 0 1 0 -1)
(#set! injection.language "sql")
)
The part before the dot looks for the attribute (#[sql]
). #any-of?
allows
to check against a list of strings, which I did in the beginning, but the ended
up with just "sql"
. The @ident
(starting with @
) is a "capture". The name
ident
is arbitrary and could also be xyz
. But the second one is a built-in
name expected by Neovim. The content to be highlighted has to go into
@injection.content
and @injection.language
needs to hold the name of
language to be used for highlighting.
The dot means that the next pattern must match directly after the first one. Otherwise it would match any later sibling. Then I match for the string literal inside the body of the function, which is self-explanatory after you got the idea.
Now for a tricky detail: The pattern matches the full string, including the
double quotes. If you would highlight that one, it would still be a string. The
highlighting needs to be applied to the content of the string without double
quotes. The #offset!
is doing just that: It removes the first and last
characters. Finally I use #set!
to tell Tree-sitter which language to
highlight.
injections.scm
To give it a try put the query into a file called injections.scm
in
after/queries/rust
in one of your config pathes. Copy the initial Rust
snippet into a *.rs
file, open it and see what happens.
Highlighting
Motivated by my success I wondered whether it would be possible to highlight the SQL a bit more. And of course it is. You just need a slightly modified query:
; extends
(
(attribute_item
(attribute
(identifier) @x (#any-of? @x "sql")
)
)
.
(function_item
body: (block (string_literal) @inlinesql)
)
)
It is more or less the same, mainly different capture names. The @x
does not
really matter beside checking for the name of the identifier. But if you copy
this query into queries/rust/highlights.scm
(without after
, don't ask me
why) the @inlinesql
automatically defines a highlighting group in Neovim. You
could now execute :hi @inlinesql guibg=#502040
and the SQL code will get a
different background color.
Conceal
Neovim has the option to conceal certain characters, so I wondered whether I could hide the double quotes around the SQL code. Long story short: It should be possible, but it does not work.
You can write a query to match only the double quotes and would then be able to conceal them. I had a long conversation in the Neovim-treesitter Matrix channel and very nice and helpful people figured out that something is not fully working yet. So it will be possible in the future, but does not yet work with my 0.10 version.
Feedback
I don't like writing. I hate it. But I like good discussions about technology,
code. I consider such blog posts as an invite for exchange. If you like it,
have improvements, questions, ... please let me know. You find me at Mastodon
as @achim@social.saarland
. My email is firstname@lastname.de.
Credits
This post would not exist without input from various people: Clason in the Matrix was insanely patient and helpful in figurint out problems with my setup. TJ and ThePrimagen provide great and encouraging videos, which I would not even know without Sascha pointing me to them.