Tree-sitter

Building a language around a Tree-sitter grammar.

The Tree-sitter library is used to create parsers which can provide rich syntax trees for languages both simple and complex.

Tree-sitter is the preferred method for adding new languages as of Nova 10 and later. If you would like to support older versions of Nova, use a Regex grammar.

The Tree-sitter project itself provides four core pieces:

A JavaScript library for writing grammars
A tool for converting grammars into parsers
A library for applications to use those parsers to parse text into syntax trees
A Query language for introspecting the contents of those syntax trees

Of these four, Nova handles the third (parsing) itself using parsers provided by extensions, and facilitates the fourth (introspection) using queries provided by extensions.

Getting Started

For those that have never used Tree-sitter before to those who are intimately familiar with it, these steps are the same. This documentation will provide a brief overview of the process, and then move in-depth with each step.

To integrate a new language based on Tree-sitter into Nova, the following steps must be taken:

Write a grammar (or adopt an existing one)
Generate a parser from your grammar with Tree-sitter’s CLI tool (tree-sitter generate)
Take your generated parser files and compile them into a native library
Include your compiled library in a Nova extension
Write queries to connect your parser up to syntax highlighting, symbolication, and other editor features

Each time you make changes to your grammar, you will need to follow these steps to generate, compile, and include those changes in your extension, as well as tweak any queries to account for those changes.

Requirements

To develop a grammar and parser, you will need:

Node.js
The Tree-sitter CLI
A C compiler toolchain

For a C compiler toolchain, we recommend installing Apple’s Xcode, as it contains everything in an easy-to-install package you might need. You can install the application bundle, or just the command line tools by using xcode-select --install. You can also install Clang and LLVM directly via Homebrew.

Writing a Grammar

Tree-sitter grammars are created by combining sets of parsing rules written in JavaScript.

If you are not writing your own grammar (or making modifications to an existing one), you can likely skip this step.

The Tree-sitter documentation is the best source for learning how a grammar is structured and what library features you can use to get it just right. In short, you will have a grammar.js file in your project folder which will contain all the necessary rules for parsing text in your language.

We recommend looking at some existing grammars for a better understanding of how they are constructed and maintained. Some good examples are any of the first-party grammars which are part of the Tree-sitter project:

Generating a Parser

Once you have a grammar, you can use it to generate a parser using the Tree-sitter CLI.

Every generated parser will have a name. By default, the Tree-sitter CLI will use the name of the folder in which the project lives if it has the format tree-sitter-(name) (e.g. if your project directory is named tree-sitter-rust/, it will generate a parser with the name rust). This name is important for several reasons which come up in later steps, and using this format for your project folder is common convention with Tree-sitter.

From the top-level folder of your parser project, run:

tree-sitter generate

As long as there are no errors in your grammar, this should create the file src/parser.c (and likely others alongside).

The parser.c file will contain an exported function named tree_sitter_(name)(), where (name) is the parser’s name.

Some grammar projects may use C++ instead of C for their generated parser. This is fine, so long as you end up with a parser.cpp file in the same way as a C parser.

Parsers may also utilize an external scanner. This will generate a scanner.c (or scanner.cpp) file which will need to be included when compiling the parser.

For more information, see the Tree-sitter documentation.

Compiling a Parser

Once a parser has been generated it must be compiled into a native dynamic library for use in Nova.

Most Tree-sitter parsers can be set up to use Make or CMake for compiling native binaries. The first-party parsers from the Tree-sitter project each contain a Makefile.

If your project is not set up with a build tool such as Make, we provide a simple 🛠Parser Build Script to get you started. This includes both a build script as well as an example Makefile which should work with most parsers out of the box should yours not already include one. The Makefile can be placed in the top-level of your project folder (alongside your grammar.js). To use this build tool: ./compile_parser.sh path/to/your/project/ path/to/Nova.app.

The following requirements must be satisfied when compiling a native dynamic library for use in Nova:

Contains slices supporting both the Apple Silicon (arm64) and Intel (x86_64) architectures.
- This is to ensure that your parser is compatible with all Macs that Nova users might have.
- You can do this in your own build scripts with the compiler flags: -arch arm64 -arch x86_64
Includes the Tree-sitter library <tree_sitter/parser.h> header.
- Tree-sitter’s CLI generates this file at src/tree_sitter/parser.h within your project.
- You can include this in your own build scripts with the compiler flags: -Isrc/ (note no space between -I and src. for compatibility with most C compilers.)
Links against Nova’s syntax engine framework, SyntaxKit.
- Nova’s SyntaxKit includes the C symbols for Tree-sitter’s library.
- You can do this in your own build scripts with the linker flags: -Fpath/to/Nova.app/Contents/Frameworks -framework SyntaxKit
Specifies a runpath search path relative to Nova’s application bundle.
- This ensures that your parser can be loaded from Nova no matter where a user might place the application bundle on disk.
- You can do this in your own build scripts with the linker flags: -rpath @loader_path/../Frameworks

Once you have compiled your parser, you should have a dynamic library named something like libtree-sitter-(name).(majorVer).(minorVer).dylib (the exact filename will depend on the build tool and flags used). For example, from a Swift parser this might be named libtree-sitter-swift.0.3.dylib.

This dylib is the only piece of the compiled parser you will need for your extension.

Including a Parser in Extensions

Copy your parser’s dynamic library into the Syntaxes/ folder of your extension and rename it using the following format: libtree-sitter-(name).dylib, where name is the same parser name mentioned in Generating a Parser.

In your syntax XML file, add the tree-sitter element:

<tree-sitter></tree-sitter>

By default, this instructs the syntax engine to look for a Tree-sitter parser alongside the XML file, named the same as the syntax’s name attribute. For a syntax named mylang, it will look for libtree-sitter-mylang.dylib exporting a function named tree_sitter_mylang().

If for some reason you can’t use the same name for your syntax and the parser (such as if you’re using a shared parser for multiple syntaxes), you can specify it manually by using <tree-sitter language="(name)">, where (name) is the parser name as mentioned in Generating a Parser, such as language="mylang".

Note: the name used for the parser, whether it be taken from the name of the syntax or manually specified, must match both the dynamic library included in your extension and the function it exports.

The tree-sitter element is the primary point from which you will specify which Queries your language supports for powering editor features.

A note about Gatekeeper: If you distribute your extension outside of Nova’s extension library, users may encounter a Gatekeeper alert when Nova loads the Tree-sitter dynamic library due to quarantine flags placed on the extension bundle. To resolve this, users should clear the quarantine flags: xattr -d com.apple.quarantine path/to/MyExtension.novaextension.

Security Considerations

As Tree-sitter grammars are compiled into native dynamic libraries, there may be increased concern around security.

Third-party Tree-sitter parsers contributed by extensions are loaded into a secondary XPC service process by Nova separate from the main IDE (as well as separate from all other extension code). Interactions with a parser are then performed using XPC inter-process communication with checks enforced onto any messages being passed back and forth to ensure data consistency and safety.

This process, named NovaParseService, is sandboxed to prevent it from accessing the filesystem, network, and other sensitive areas of the system. Parsers should only have access to the contents of documents being parsed and no other significant data from the user’s project workspace. All third-party parsers are loaded into the same running XPC service.

For signing of the dynamic library binary, Apple’s built-in ad-hoc signing is sufficient, but the binary may also be signed by a Mac developer certificate. The parse service enforces the Hardened Runtime security features of macOS, but does not enforce a valid code signing identity when loading parsers (Library Validation).

Queries

The Tree-sitter query language is used to target specific nodes in a syntax tree using a simple Scheme-like syntax.

Nova uses files written in the query language to power most features that use a Tree-sitter parser, such as syntax highlighting and symbolication. Syntax highlighting and limited autocomplete support is built-in to Nova for files written in the query language.

The query language is composed of three main parts:

S-expressions which target nodes in a syntax tree (surrounded by ())
Captures (starting with @) which mark certain nodes as important for operations
Predicates and Directives (starting with #) which can filter on what nodes a query matches, get and set metadata, and transform captured text

For more information on the syntax of the query language, see the query language docs.

Queries in a Nova extension are stored within files in the Queries/ folder in the top-level of your extension. Query files should most often have the scm extension (a holdover from Scheme), and their names follow conventions depending on their purpose. See the following query sections for more details.

Each type of query supported by Nova has a corresponding XML element that can be specified as a child of the tree-sitter element to indicate that the extension supports that feature.

<tree-sitter>
    <highlights />
    <symbols />
    <folds />
</tree-sitter>

By default, each query element will look for a specific filename if no path for the query is specified.

If you need to break up each type of query into multiple files (such as for ease of editing) you can specify more than one copy of a query element and a path attribute on each to denote which files to load. These can even be stored in subfolders for easier organization. Query paths are always relative to the Queries/ folder.

<tree-sitter>
    <highlights path="highlights/types.scm" />
    <highlights path="highlights/functions.scm" />
    <symbols />
    <folds />
</tree-sitter>

Captures

Captures in a query are specified using an @ symbol followed by an identifier consisting of only alphanumerics, underscores, dashes, and periods. Some examples: @foo, @foo.bar, @foo_bar, @foo-bar.

((call_expression
  (function_name) @_function) @subtree
 (#match? @_function "(?i)^(hsl|hsla|rgb|rgba|color)$")
)

Each category of query handles captures in a specific way, and which names might be “special” is different depending on the query type.

However, any capture name beginning with an underscore (such as @_foo) is guaranteed to be reserved for your extension’s use and will not conflict with any “special” capture names. Consider using underscores for captures that you might be using when filtering queries using predicates, where those capture names otherwise do not need to be exposed.

Syntax Highlighting

The most common (and definitely most required) query of any language extension is support for syntax highlighting, or coloring of tokens in a document to indicate their meaning.

Syntax highlighting support is provided using one or more query files using the highlights element:

<tree-sitter>
    <highlights />
</tree-sitter>

By default, specifying the highlights element without a path attribute will tell the syntax engine to look for a file named highlights.scm within your extension’s Queries/ folder.

Note: many Tree-sitter grammar projects have a convention for a queries/highlights.scm file used with the tree-sitter parse CLI command and as a basis for other tools. While these can be used for Nova’s syntax highlighting support, they do not use the same highlighting selectors, and as such are not an immediate drop-in solution and will require tweaking if included in a Nova extension.

Syntax highlighting queries apply theming selectors to the document using captures. Each capture specified on a node will be parsed as a selector (as long as it does not begin with an underscore).

Examples

Some simple examples for HTML tags and comments:

(tag_name) @tag.name
["<" ">" "</" "/>"] @tag.bracket
(doctype) @processing.doctype
(comment) @comment

A complex example for HTML attributes:

((attribute
    (attribute_name) @tag.attribute.name
    ["="]? @tag.attribute.operator
    [
      (attribute_value) @tag.attribute.value
      (quoted_attribute_value
        ["\"" "'"] @tag.attribute.value.delimiter.left
        (_)? @tag.attribute.value
        ["\"" "'"] @tag.attribute.value.delimiter.right
      )
    ]?
  )
  (#not-match? @tag.attribute.name "(?i)^(src|href)$")
)

Symbolication

Symbolication is the process of taking specific nodes from the syntax tree and building a list of user-visible “symbols” that are the major structural components of the file. For procedural languages, this will likely be things like types, functions, etc. For a language like HTML, this is likely important tags.

Symbolication support is provided using one or more query files using the symbols element:

<tree-sitter>
    <symbols />
</tree-sitter>

By default, specifying the symbols element without a path attribute will tell the syntax engine to look for a file named symbols.scm within your extension’s Queries/ folder.

Within a symbols query, there are special capture names and metadata keys which are used to mark the specific region of text to symbolicate as well as how the resulting symbol should behave.

Capture	Description
`@name`	Marks the “name” of the symbol for indexing and completions
`@name.target`	The node to target for the name query
`@displayname`	Marks the “display name” of the symbol shown in symbol lists
`@displayname.target`	The node to target for the display name query
`@arguments.target`	The node to target for the arguments query
`@subtree`	Use this node and its subtree as the symbol
`@start`	Begin the symbol after this node
`@start.before`	Begin the symbol before this node
`@end`	End the symbol before this node
`@end.after`	End the symbol after this node

Metadata Key	Description	Value Type
`role`	The type of symbol	String
`name.query`	The query to run to build a name	String
`displayname.query`	The query to run to build a display name	String
`arguments.query`	The query to run to build arguments	String
`scope.byLine`	Whether the symbol should align to line boundaries	None
`scope.extend`	Whether to expand the symbol to the end of its parent	None
`scope.level`	The level of the symbol when it cannot be determined automatically	Number
`scope.group`	The group name used when building a symbol using multiple queries	String
`scope.groupByName`	Use the symbol’s name for grouping instead of `scope.group`	None
`autoclose.expression`	The expression typed to invoked autoclosing	String
`autoclose.completion`	The expression expanded when autoclosing	String

Symbol Roles

Each symbol should define its type, or Role. This determines which icon is shown for a symbol, how it is used in autocomplete, etc.

(#set! role function)

The set of valid roles for a symbol are those available to the Symbol API.

Additionally, there are special symbol roles which can be specified to handle certain cases:

Role	Description
`function-or-method`	Make the symbol a function or method, depending on its ancestors

Symbol Name

A symbol’s name is its canonical identifier, used when searching via Quick Open, Jump to Definition, etc. It is generally the name which appears directly in a document’s text.

A symbol’s name can be specified in one of two ways. You can use the @name capture to mark the node whose textual contents should be used for the symbol’s name. If the @name capture is specified more than once, or captures multiple nodes in a single query match, the name will be constructed from a union of the matched text ranges.

Name Query

Alternatively, you can specify the @name.target capture and name.query metadata key, which marks a node that will be used for building a name using an additional query. When using a name query, the name.query metadata key specifies a query path relative to your extension’s Queries/ folder which will be evaluated on the targeted node (and its subtree).

An example of specifying use of a name query for HTML tags:

((element
  (start_tag (tag_name) @name) @start.before @displayname.target
  (end_tag)? @end.after)
 (#set! displayname.query "tagName.scm")
)

The name subquery has its own set of captures available:

Captures	Description
`@result`	Collects textual content for query’s result

The @result capture is used to collect textual components. Each node matched by this capture will be appended to the resulting display name.

The @result capture can also then be targeted by Transform Directives to further modify its contents.

This result will then be passed back to the symbol query which invoked it to provide the symbol’s name.

Symbol Display Name

While a symbol’s name is used within a document, its display name is what is displayed in the Symbols list. The display name can be richer and convey more information to improve user experience.

A symbol’s display name can be specified in one of two ways. You can use the @displayname capture to mark the node whose textual contents should be used for the symbol’s name. If the @displayname capture is specified more than once, or captures multiple nodes in a single query match, the name will be constructed from a union of the matched text ranges.

Display Name Query

Alternatively, you can specify the @displayname.target capture and displayname.query metadata key, which marks a node that will be used for building a display name using an additional query. When using a display name query, the displayname.query metadata key specifies a query path relative to your extension’s Queries/ folder which will be evaluated on the targeted node (and its subtree).

An example of specifying use of a display name query for HTML tags:

((element
  (start_tag (tag_name) @name) @start.before @displayname.target
  (end_tag)? @end.after)
 (#set! displayname.query "tagDisplayName.scm")
)

The display name subquery has its own set of captures available:

Captures	Description
`@result`	Collects textual content for query’s result

The @result capture is used to collect textual components. Each node matched by this capture will be appended to the resulting display name.

The @result capture can also then be targeted by Transform Directives to further modify its contents.

This result will then be passed back to the symbol query which invoked it to provide the symbol’s display name.

An example display name query for HTML tags:

; Tag name
(tag_name) @result

; ID attributes, formatted to "#value"
((attribute
  (attribute_name) @_attrname
  [
    (attribute_value) @result
    (quoted_attribute_value ["\"" "'"] (_)? @result ["\"" "'"])
  ]?
 )
 (#match? @_attrname "(?i)^id$")
 (#prefix! @result "#")
)

; Class attributes, formatted to ".value"
((attribute
  (attribute_name) @_attrname
  [
    (attribute_value) @result
    (quoted_attribute_value ["\"" "'"] (_)? @result ["\"" "'"])
  ]?
 )
 (#match? @_attrname "(?i)^class$")
 (#replace! @result "\\s+" ".")
 (#prefix! @result ".")
)

This query has three components, to combine a tag’s name, ID (if any), class(es) (if any).

For the tag:

<div id="foobar" class="left static"></div>

…this will result in a computed display name of div#foobar.left.static shown in the Symbols list.

Symbol Region

When defining the region of a symbol, either a single discrete node or a pair of nodes must be specified.

The @subtree capture will target a specific node and its subtree. This forms the symbol’s entire bounds.

To instead target a region between two nodes, use the @start / @start.before and @end / @end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @start, the last matched will be used, and if using @end, the first matched will be used).

((element
  (start_tag (tag_name) @name) @start.before
  (end_tag)? @end.after)
)

If only a @start (or @start.before) capture is specified, the scope.extend metadata key can be set to indicate that the “end” of the symbol should be wherever its parent ends (or the document ends, in the case of no parent).

When a symbol is constructed, it will automatically be grouped into any symbol which comes before if they intersect.

In cases where this is not possible, you can define the scope.level metadata key. This is useful in, for example, Markdown documents where symbols for sections might be built from a syntax tree that does not itself define depth of its sections. The value should be a number (starting with 1) representing the “depth” of the symbol. When the tree of symbols is constructed, symbols with a level are automatically grouped into any symbol coming before it so long as the previous symbol’s level is lower.

((atx_heading
  .
  (atx_h1_marker)) @start.before
  (#set! role heading)
  (#set! scope.level 1)
  (#set! scope.extend)
)
((atx_heading
  .
  (atx_h2_marker)) @start.before
  (#set! role heading)
  (#set! scope.level 2)
  (#set! scope.extend)
)
; ...etc.

If the scope.byLine metadata key is set, the region will automatically have its boundaries aligned with the lines of text it intersects. When this happens, the last line intersecting the symbol’s end will be excluded. This may be useful in certain languages, like Markdown, where the end token targeted for symbolication and anything else on the line should not be included in the symbol.

(function_definition_statement
  name: (_) @start.before
  ")"? @start.before
  "end" @end
 (#set! scope.byLine)
)

For complex cases, symbols can also be constructed from multiple queries when a single query cannot accurately target the required set of nodes. By using the scope.group metadata key with a string value you can denote that one query using @start (or @start.before) should be linked with another query using @end (or @end.after). When two matched queries have the same group name and appear within the same level of the tree of symbols, they will be combined. The names of groups is not important, so long as the two halves of the region match.

When grouping many symbols of the same type but differing depths, it may not be possible to accurately “name” all of the groups in your query directly. If both the start and end query building the symbol knows the symbol’s “name”, you can specify scope.groupByName, and the query will instead use the symbol’s name as the group name. An example of this in practice is HTML tags: both the start and end tag know their name, so they can be grouped automatically using it.

((jsx_opening_element
  name: [
    (identifier)
    (nested_identifier)
  ] @name) @start.before
  (#set! role tag)
  (#set! scope.groupByName)
)

Arguments Query

For function-like symbols which have arguments, a query can instruct the syntax engine to symbolicate arguments for use in signature help and autocomplete.

By using the @arguments.target capture, the query marks a node that will be used for building arguments. The arguments.query metadata key then specifies a query path relative to your extension’s Queries/ folder which will be evaluated on the targeted node (and its subtree).

An example of symbolicating arguments with a TypeScript method definition:

((method_definition
    name: (property_identifier) @name
    parameters: (formal_parameters) @arguments.target) @subtree
  (#set! arguments.query "arguments.scm")
)

The arguments subquery has its own set of captures available:

Captures	Description
`@name`	The argument name
`@type`	The argument type

Each match of the query within the targeted subtree will create an argument that can be autocompleted when the user completes the function-like symbol’s name, or displayed when the user invokes signature help.

Autoclosing

Certain symbols may wish to support “autoclosing,” or the behavior in which typing a certain expression while inside of the symbol’s region will “close” the symbol by expanding out additional characters. An example of this is HTML tags, where typing the expression </ will automatically expand the name of the tag and closing > without the user needing to type them.

To add autoclosing support to a symbol, two metadata keys are used: autoclose.expression and autoclose.completion. The first is used to specify what the user must type to invoke autoclose while inside of the symbol’s region. Once this happens, the autoclose.completion string will be expanded and inserted at the cursor position.

The completion expression supports string expansion using the ${variable} format and the following expression variables:

Expression Variable	Description
`name`	The symbol’s name

An example of using autoclosing with HTML tags:

((element
  (start_tag (tag_name) @name) @start.before @displayname.target
  (end_tag)? @end.after)
 (#set! displayname.query "tagDisplayName.scm")
 (#set! autoclose.expression "</")
 (#set! autoclose.completion "${name}>")
)

In this case, when the user’s cursor is within a div tag and they type </, the autoclose completion expression will be expanded into div> and inserted, resulting in the proper end tag behind the cursor position.

Folds

Fold queries define the boundaries on which automatic code folding support is provided in Nova’s editor.

Folding support is provided using one or more query files using the folds element:

<tree-sitter>
    <folds />
</tree-sitter>

By default, specifying the folds element without a path attribute will tell the syntax engine to look for a file named folds.scm within your extension’s Queries/ folder.

By default, any comments parsed using the comment syntax highlighting selector will automatically be made foldable by Nova’s editor alongside any folds defined by an extension. Comments do not explicitly need to be made foldable by folding queries unless they do not conform in this way.

Within a folds query, there are special capture names and metadata keys which are used to mark the specific region of text to make foldable as well as how the region should behave.

Capture	Description
`@subtree`	Use this node and its subtree as the foldable region
`@start`	Begin the foldable region after this node
`@start.before`	Begin the foldable region before this node
`@end`	End the foldable region before this node
`@end.after`	End the foldable region after this node

Metadata Key	Description	Value Type
`role`	The type of folding region	String
`scope.byLine`	Whether the region should align to line boundaries	None
`scope.extend`	Whether to expand the region to the end of its parent	None
`scope.level`	The level of the fold when it cannot be determined automatically	Number
`scope.group`	The group name used when building a region using multiple queries	String

Folding Roles

Each fold can optionally define a Role. These roles are used when certain editor folding actions are taken, such as if a user invokes “Fold All Functions & Methods”. By defining a fold as a function, it will be included.

(#set! role function)

Roles	Example
comment	Documentation comments
block	A logical block, such as an “if” statement
function	Functions, method, etc.
heading	Section, such as Markdown headings
tag	HTML tags
type	Classes, interfaces, etc.

Foldable Region

When defining the region of foldable text, either a single discrete node or a pair of nodes must be specified.

The @subtree capture will target a specific node and its subtree as foldable. This forms the fold’s entire region.

((element
  (start_tag) @start
  (end_tag) @end)
 (#set! role tag)
)

If only a @start (or @start.before) capture is specified, the scope.extend metadata key can be set to indicate that the “end” of the region should be wherever its parent ends (or the document ends, in the case of no parent).

When a fold is constructed, it will automatically be grouped into any fold which comes before if its region intersects.

In cases where this is not possible, you can define the scope.level metadata key. This is useful in, for example, Markdown documents where each fold might be built from a syntax tree that does not itself define depth of its sections. The value should be a number (starting with 1) representing the “depth” of the fold. When the tree of folds is constructed, folds with a level are automatically grouped into any fold coming before it so long as the previous fold’s level is lower.

((atx_heading
  .
  (atx_h1_marker)) @start
  (#set! role heading)
  (#set! scope.level 1)
  (#set! scope.extend)
)
((atx_heading
  .
  (atx_h2_marker)) @start
  (#set! role heading)
  (#set! scope.level 2)
  (#set! scope.extend)
)
; ...etc.

If the scope.byLine metadata key is set, the region will automatically have its boundaries aligned with the lines of text it intersects. This is useful for languages which use textual boundary tokens (such as Ruby) where you might define a foldable region between the tokens func and end. By default, everything in between is folded, leaving the fold marker situated between these two tokens on the same line: func[]end. This is awkward and not really ideal. By specifying scope.byLine, the editor will automatically ensure that any trailing newline before the final token is excluded from the fold, leaving the end token on the next line.

(function_definition_statement
  name: (_) @start
  ")"? @start
  "end" @end
 (#set! scope.byLine)
)

For complex cases, folds can also be constructed from multiple queries when a single query cannot accurately target the required set of nodes. By using the scope.group metadata key with a string value you can denote that one query using @start (or @start.before) should be linked with another query using @end (or @end.after). When two matched queries have the same group name and appear within the same level of the tree of folds, they will be combined. The names of groups is not important, so long as the two halves of the region match.

((php_tag) @start
 (#set! role tag)
 (#set! scope.group php_tag)
)
((text_interpolation "?>") @end
 (#set! scope.group php_tag)
)

Injections

Injections allow for a language to mark regions of a document which should be parsed as another language by the editor (also known as “code fences”). Examples of this include script and style tags in HTML and triple-backtick blocks in Markdown.

Injection support is provided using one or more query files using the injections element:

<tree-sitter>
    <injections />
</tree-sitter>

By default, specifying the injections element without a path attribute will tell the syntax engine to look for a file named injections.scm within your extension’s Queries/ folder.

Within an injection query, there are special capture names and metadata keys which are used to mark the specific region of text to reparse as well as what language to use.

Capture	Description
`@injection.content`	Use this node and its subtree as the injected region
`@injection.content.start`	Begin the injected region after this node
`@injection.content.start.before`	Begin the injected region before this node
`@injection.content.end`	End the injected region before this node
`@injection.content.end.after`	End the injected region after this node
`@injection.language`	Use the textual content of this node for the injected language

Metadata Key	Description	Value Type
`injection.combined`	Whether multiple injected regions should be considered one	None
`injection.language`	The injected language	String
`injection.reset`	Instruct syntax highlighting to reset attributes for the region	String

Injected Language

When defining which language is used to parse the region, you can either specify the injection.language metadata key (using Directives) or use the @injection.language capture to specify a node whose textual content is parsed (useful for code fences where a language identifier is directly specified in the document).

The language identifier specified is parsed using Injection regular expressions to allow for potential differences in language identifier formats depending on the parent language. If no injection regular expression matches, the syntax engine checks if any syntax is registered whose name is the same as the identifier.

Content Region

When defining the region of text to reparse for an injection, either a single node or separate start and end nodes should be specified.

The @injection.content capture will mark a specific node and its subtree as the content region.

To instead target a region between two nodes, use the @injection.content.start / @injection.content.start.before and @injection.content.end / @injection.content.end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @injection.content.start, the last matched will be used, and if using @injection.content.end, the first matched will be used).

Combined Regions

Documents which contain multiple disparate regions of an injected language may wish to convey that they are the same logical region (e.g. in a language like PHP, every PHP template tag is linked into one common “php document” and share the same scope).

By setting the injection.content metadata key, the syntax engine will combine multiple injected regions existing at the same level of the parse tree into one logical unit and parse the contents of all of them together. Otherwise, each will be parsed separately as if it were its own document.

(#set! injection.combined)

Examples

Examples from the Markdown language extension:

; Yaml frontmatter
(document
  .
  (thematic_break) @injection.content.start
  .
  (setext_heading
    (setext_h2_underline) @injection.content.end)
 (#set! injection.language yaml)
)

; HTML blocks
(html_block
 (#set! injection.language html)
) @injection.content

; Fenced code blocks
(fenced_code_block
  (info_string (text) @injection.language)
  (code_fence_content) @injection.content)

Text Checking

Languages that make heavy use of prose may wish to include support for automatic text checking support, which performs operations like spell checking and automatic URL detection. This is often used in languages like HTML and Markdown for human-readable text as well as in many procedural languages for documentation comments.

By default, Nova’s editor will scan any region syntax highlighted with the comment selector to be included in text checking. This means that most languages likely don’t have to do anything to support it.

Text checking support can be manually overridden using one or more query files using the text-checking element:

<tree-sitter>
    <text-checking />
</tree-sitter>

By default, specifying the text-checking element without a path attribute will tell the syntax engine to look for a file named textChecking.scm within your extension’s Queries/ folder.

Within a text checking query, there are special capture names which are used to mark the specific region of text to make checkable.

Capture	Description
`@subtree`	Use this node and its subtree as the checkable region
`@start`	Begin the checkable region after this node
`@start.before`	Begin the checkable region before this node
`@end`	End the checkable region before this node
`@end.after`	End the checkable region after this node

The @subtree capture will target a specific node and its subtree as checkable. This forms the entire region.

Colors

Extensions that make use of the Colors API can provide a query to automatically detect potential color values in the document to be passed to the extension for further processing.

Any nodes matched by color queries will be collected by the editor and provided in the ColorInformationContext object’s candidates property when a color request is made.

Colors support can be provided using one or more query files using the colors element:

<tree-sitter>
    <colors />
</tree-sitter>

By default, specifying the colors element without a path attribute will tell the syntax engine to look for a file named colors.scm within your extension’s Queries/ folder.

Within a colors query, there are special capture names which are used to mark the specific region of text to pull out as a color candidate.

Capture	Description
`@subtree`	Use this node and its subtree as the candidate
`@start`	Begin the candidate after this node
`@start.before`	Begin the candidate before this node
`@end`	End the candidate before this node
`@end.after`	End the candidate after this node

The @subtree capture will target a specific node and its subtree as a candidate.

Predicates and Directives

Predicates and directives are akin to functions in procedural languages. Predicates are used to filter what a query matches based on specific critera, while directives are used to instruct the syntax engine to perform specific operations (such as setting metadata).

The Tree-sitter library itself does not provide any concrete predicates or directives. Instead, there are a few commonly-agreed upon ones which are utilized by several tools and applications. Nova implements these, as well as a number of custom ones to facilitate its features.

These expressions take arguments which can be either a Capture or a String.

(#eq? @attr "border:")
(#not-eq? @tag.name div)

Capture arguments are references to nodes captured using the @name syntax on nodes in the query. They are specified in the same way, including the @ symbol.

Strings are constant values wrapped within double quotes (such as "foo-bar"). As a convenience, strings which consist of only alphanumeric characters, underscores, dashes, and periods are allowed to be specified without quotes (such as foo.bar). This is often used for better readability of special keys and identifiers.

Note: for strings which specify a regular expression pattern, character classes using the backslash must be double-escaped (such as \\s+) due to a single escape being reserved for escaping the string’s characters.

Predicates

Predicates are used to further refine whether a query matches a set of nodes at all. Where the query can narrow down matches based on the structure of the tree, predicates allow comparing against captured text from the document.

eq? / any-eq?

Filters a query based on whether a capture is equal to one or more values.

Syntax: (#eq? capture captureOrString...)

The first argument is always a captured node whose text is compare against. For the additional arguments, if a capture name is provided the textual contents of the captured node will be compared; if a string is provided, the string itself will be compared. Two or more additional arguments will be compared in an OR fashion, as if to say “is equal to any of these values.”

The any- variant can be used with quantized captures, when multiple nodes can be captured when using the * or + operators. In such cases, this predicate will match if “any” of the nodes do, as opposed to the “non-any” equivalent, which will only match if “all” nodes do.

Compatibility notes: The any- variant was added in Nova 12. In Nova 11 and later, more than two arguments may be provided (in Nova 10, only two arguments were supported for a single comparison.)

Examples:

(#eq? @attr "corner-radius")
(#eq? @tag-start-name @tag-end-name)

not-eq? / any-not-eq?

Filters a query based on whether a capture is not equal to one or more values.

Syntax: (#not-eq? capture captureOrString...)

The first argument is always a captured node whose text is compare against. For the additional arguments, if a capture name is provided the textual contents of the captured node will be compared; if a string is provided, the string itself will be compared. Two or more additional arguments will be compared in an AND fashion, as if to say “is not equal to any of these values.”

Examples:

(#not-eq? @attr "corner-radius")
(#not-eq? @tag-start-name @tag-end-name)

contains? / any-contains?

Filters a query based on whether a capture contains one or more values.

Syntax: (#contains? capture captureOrString...)

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#contains? @attr "corner" "radius")
(#contains? @tag-start-name @tag-end-name)

not-contains? / any-not-contains?

Filters a query based on whether a capture does not contain one or more a value.

Syntax: (#not-contains? capture captureOrString...)

The first argument is always a captured node whose text is compare against. For the additional arguments, if a capture name is provided the textual contents of the captured node will be compared; if a string is provided, the string itself will be compared. Two or more additional arguments will be compared in an AND fashion, as if to say “does not contain any of these values.”

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#not-contains? @attr "corner" "radius")
(#not-contains? @tag-start-name @tag-end-name)

match? / any-match?

Filters a query based on whether a capture matches a regular expression.

Syntax: (#match? capture regex)

The textual contents of the node the capture denotes will be evaluated using the provided regular expression. If the expression matches within the text the predicate will evaluate to true, otherwise it will evaluate false.

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

Compatibility notes: The any- variant was added in Nova 12.

Examples:

(#match? @attr "[a-zA-Z0-9_]+")
(#match? @attr "^border-(top|left|bottom|right)$")

not-match? / any-not-match?

Filters a query based on whether a capture does not match a regular expression.

Syntax: (#not-match? capture regex)

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

Compatibility notes: The any- variant was added in Nova 12.

Examples:

(#not-match? @attr "[a-zA-Z0-9_]+")
(#not-match? @attr "^border-(top|left|bottom|right)$")

has-type? / any-has-type?

Filters a query based on whether a captured node has one of a set of specified types.

Syntax: (#has-type? capture type...)

The first argument is always a captured node whose type is compare against. For the additional arguments, a string will be compared as a type name. Two or more additional arguments will be compared in an OR fashion, as if to say “type matches any of these values.”

In this context, the “type” of a node is the name of the tree-sitter grammar rule which created it. If multiple types are provided, the comparison will be performed in an OR fashion, allowing comparison to any of a set of values.

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#has-type? @attr "rule_set")
(#has-type? @attr "block" "sequence" "mapping")

not-has-type? / any-not-has-type?

Filters a query based on whether a captured node does not have one of a set of specified types.

Syntax: (#not-has-type? capture type...)

The first argument is always a captured node whose type is compare against. For the additional arguments, a string will be compared as a type name. Two or more additional arguments will be compared in an AND fashion, as if to say “type does not match any of these values.”

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#not-has-type? @attr "rule_set")
(#not-has-type? @attr "block" "sequence" "mapping")

has-parent? / any-has-parent?

Filters a query based on whether the immediate parent of a captured node has one of a set of specified types.

Syntax: (#has-parent? capture type...)

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#has-parent? @attr "rule_set")
(#has-parent? @attr "block" "sequence" "mapping")

not-has-parent? / any-not-has-parent?

Filters a query based on whether the immediate parent of a captured node does not have one of a set of specified types.

Syntax: (#not-has-parent? capture type...)

The first argument is always a captured node whose type is compare against. For the additional arguments, a string will be compared as a type name. Two or more additional arguments will be compared in an AND fashion, as if to say “type does not match any of these values.”

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#not-has-parent? @attr "rule_set")
(#not-has-parent? @attr "block" "sequence" "mapping")

has-ancestor? / any-has-ancestor?

Filters a query based on whether any ancestor of a captured node has one of a set of specified types.

Syntax: (#has-ancestor? capture type...)

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#has-ancestor? @attr "rule_set")
(#has-ancestor? @attr "block" "sequence" "mapping")

not-has-ancestor? / any-not-has-ancestor?

Filters a query based on whether any ancestor of a captured node does not have one of a set of specified types.

Syntax: (#not-has-ancestor? capture type...)

The first argument is always a captured node whose type is compare against. For the additional arguments, a string will be compared as a type name. Two or more additional arguments will be compared in an AND fashion, as if to say “type does not match any of these values.”

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#not-has-ancestor? @attr "rule_set")
(#not-has-ancestor? @attr "block" "sequence" "mapping")

nth? / any-nth?

Filters a query based on whether a captured node is the nth named child of its immediate parent.

Syntax: (#nth? capture index)

The first argument is always a captured node. The second argument is an integer (specified either bare or as a string) which represents the index to compare. If the capture node is the nth named child of its parent, the predicate evaluates to true.

The first named child is at 0. Anonymous nodes are not counted within these indexes. So, if you have three nodes in a tree: ((foo) ":" (bar)), the index of (bar) would be 1.

A negative index may be specified to count backward, where -1 represents the last named child, -2 represents its preceding named sibling, etc.

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#nth? @attr 3)
(#nth? @attr -2)

not-nth? / any-not-nth?

Filters a query based on whether a captured node is not the nth named child of its immediate parent.

Syntax: (#not-nth? capture index)

The first argument is always a captured node. The second argument is an integer (specified either bare or as a string) which represents the index to compare. If the capture node is not the nth named child of its parent, the predicate evaluates to true.

The first named child is at 0. Anonymous nodes are not counted within these indexes. So, if you have three nodes in a tree: ((foo) ":" (bar)), the index of (bar) would be 1.

A negative index may be specified to count backward, where -1 represents the last named child, -2 represents its preceding named sibling, etc.

Compatibility notes: The “non-any” variant was added in Nova 11; the any- variant was added in Nova 12.

Examples:

(#not-nth? @attr 3)
(#not-nth? @attr -2)

is?

Filters a query based on whether a metadata key is equal to a specified value.

Syntax: (#is? [target] key [value])

The value of key will be compared to the string value; if no value is provided, the predicate will evaluate true if key is set at all.

If the optional target argument is a capture, the metadata of that capture is compared; if target is excluded, the metadata of the overall match is compared.

is-not?

Filters a query based on whether a metadata key is not equal to a value.

Syntax: (#is-not? [target] key [value])

The value of key will be compared to the string value; if no value is provided, the predicate will evaluate true if key is not set at all.

If the optional target argument is a capture, the metadata of that capture is compared; if target is excluded, the metadata of the overall match is compared.

Metadata Directives

Metadata directives are used to set key-value pairs on a match and its captures, unique to each as a query runs across nodes in the parse tree.

Such values are generally set by a query during its evaluation using the directives in this section to hand information back to the syntax engine. Special variables relevant to a specific type of query are dependent on that query; see the specific query sections in previous sections for more information.

set!

Sets a metadata value unconditionally.

Syntax: (#set! [target] key [value])

The metadata value for key will be set to value (which must be a string).

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

Examples:

(#set! autoclose-expression "</")
(#set! @body myCustomVariableName "Foobar")

set-if-eq! / set-if-any-eq!

Sets a metadata value if the text of a compared capture is equal to a compared value.

Syntax: (#set-if-eq! [target] compareCapture compareValue key [setValue])

The text of compareCapture will be compared to compareValue (which can be either a capture or string). If the two operands are equal, the metadata value for key will be set to setValue (which must be a string).

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

The any- variant can be used with quantized captures, when multiple nodes can be captured when using the * or + operators. In such cases, this directive will apply if the comparison is true for “any” of the compared nodes, as opposed to the “non-any” equivalent, which will only match if “all” nodes do.

Compatibility notes: The any- variant was added in Nova 12.

set-if-not-eq! / set-if-any-not-eq!

Sets a metadata value if the text of a compared capture is not equal to a compared value.

Syntax: (#set-if-not-eq! [target] compareCapture compareValue key [setValue])

The text of compareCapture will be compared to compareValue (which can be either a capture or string). If the two operands are not equal, the metadata value for key will be set to setValue (which must be a string).

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

Compatibility notes: The any- variant was added in Nova 12.

set-if-match! / set-if-any-match!

Sets a metadata value if the text of a compared capture matches a regular expression.

Syntax: (#set-if-match! [target] compareCapture pattern key [setValue])

The text of compareCapture will be evaluated using the regular expression pattern (which must be a string). If the pattern matches, the metadata value for key will be set to setValue (which must be a string).

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

Compatibility notes: The any- variant was added in Nova 12.

set-if-not-match! / set-if-any-not-match!

Sets a metadata value if the text of a compared capture does not match a regular expression.

Syntax: (#set-if-not-match! capture pattern key [setValue])

The text of compareCapture will be evaluated using the regular expression pattern (which must be a string). If the pattern does not match, the metadata value for key will be set to setValue (which must be a string).

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

Compatibility notes: The any- variant was added in Nova 12.

set-by-case-eq!

Sets a metadata value based on how the text of a compared capture matches a series of value switch cases.

Syntax: (#set-by-case-eq! [target] compareCapture key [caseValue setValue]+ [defaultValue])

For each case pair, either a capture or string is provided as the first half (caseValue), and a string for the second half (setValue).

The text of compareCapture will be compared to each pair’s caseValue, in order. If the two operands are equal, the value of key will be set to that case’s setValue and the directive will stop evaluating cases. If none of the cases match, an optional defaultValue string may be specified to be set; otherwise, the variable will not be set.

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

Examples:

(#set-by-case-eq! @name scope.level
    "h1" 1
    "h2" 2
    "h3" 3
    "h4" 4
    "h5" 5
    "h6" 6
)

set-by-case-match!

Sets a metadata value based on how the text of a compared capture matches a series of pattern switch cases.

Syntax: (#set-by-case-eq! [target] compareCapture key [casePattern setValue]+ [defaultValue])

For each case pair, a regular expression is provided as the first half (casePattern), and a string for the second half (setValue).

The text of compareCapture will be evaluated using each pair’s casePattern, in order. If the pattern matches, the value of key will be set to that case’s setValue and the directive will stop evaluating cases. If none of the cases match, an optional defaultValue string may be specified to be set; otherwise, the variable will not be set.

If the optional target argument is a capture, the value will be set in the metadata of that capture; if target is excluded, it will be set in the metadata of the overall match.

Examples:

(#set-by-case-match! @name role
    "(?i)^(h1|h2|h3|h4|h5|h6|header|hgroup)$" tag-heading
    "(?i)^(article|aside|main|nav|section)$" tag-section
    "(?i)^(a)$" tag-anchor
    "(?i)^(link)$" tag-link
    "(?i)^(img)$" tag-image
    tag
)

Transform Directives

Transform directives operate on text collected by captures. They can take the textual contents of a captured node and progressively change it using a series of string operations.

They are evaluated in the order they appear in the query (depth within braces having no bearing). These directives are only used in certain operations which pull out text from captures, such as in naming a symbol during symbolication.

Transformations are destructive to the capture’s ultimate resulting text. If multiple transform directives are evaluated on the same capture, the first will receive the original textual contents, the second will receive the result of the first, and so on.

This does not affect the operation of predicates, which always use the original textual contents of the capture before transformation. Transform directives only affect the result handed back to the syntax engine once the query fully evaluates.

prefix!

Prefixes the textual result of a capture with one or more other captures or strings.

Syntax: (#prefix! capture captureOrString...)

The current text of the capture will be prefixed with the provided arguments, in order. Each additional argument may be either a capture name or string. If a capture is specified its current textual result will be used.

Performing multiple prefix directives on the same capture will continue to prefix onto the beginning of the previous result.

Examples:

; Before: @tag-id == "myTag", @tag-name == "div"
(#prefix! @tag-id @tag-name "#")
; After: @tag-id == "div#myTag"

append!

Appends onto the textual result of a capture with one or more other captures or strings.

Syntax: (#append! capture captureOrString...)

The current text of the capture will be appended with the provided arguments, in order. Each additional argument may be either a capture name or string. If a capture is specified its current textual result will be used.

Performing multiple append directives on the same capture will continue to append onto the end of the previous result.

Examples:

; Before: @attr-name == "border-color", @attr-value == "red"
(#append! @attr-name ": " @attr-value ";")
; After: @tag-name == "border-color: red;"

strip!

Strips text from the beginning and end of the textual result of a capture which matches a provided regular expression.

Syntax: (#strip! capture pattern)

Text matching the regular expression pattern anchored at both the beginning and end of the current text of the capture will be stripped, leaving only what text did not match in between. This can be useful in stripping whitespace, for example.

Examples:

(#strip! @attr-name "\\s+")

replace!

Replaces text in the textual result of a capture which matches a provided regular expression using a replacement template expression.

Syntax: (#replace! capture pattern template)

Each block of text in the current text of the capture which matches the regular expression pattern will be replaced using the expression template. This expression may utilize backreferences using the \# syntax, where # is the number starting with 1 of a capture reference in the original regular expression, with \0 representing the entire match.

Examples:

(#replace! @attr-name "([a-zA-Z0-9]+)\\s+\{" ".\1")

← Previous

Using the Syntax Inspector

Defining a Syntax

RESULTS:

Tree-sitter

Getting Started

Requirements

Writing a Grammar

Generating a Parser

Compiling a Parser

Including a Parser in Extensions

Security Considerations

Queries

Captures

Syntax Highlighting

Examples

Symbolication

Symbol Roles

Symbol Name

Name Query

Symbol Display Name

Display Name Query

Symbol Region

Arguments Query

Autoclosing

Folds

Folding Roles

Foldable Region

Injections

Injected Language

Content Region

Combined Regions

Examples

Text Checking

Colors

Predicates and Directives

Predicates

eq? / any-eq?

not-eq? / any-not-eq?

contains? / any-contains?

not-contains? / any-not-contains?

match? / any-match?

not-match? / any-not-match?

has-type? / any-has-type?

not-has-type? / any-not-has-type?

has-parent? / any-has-parent?

not-has-parent? / any-not-has-parent?

has-ancestor? / any-has-ancestor?

not-has-ancestor? / any-not-has-ancestor?

nth? / any-nth?

not-nth? / any-not-nth?

is?

is-not?

Metadata Directives

set!

set-if-eq! / set-if-any-eq!

set-if-not-eq! / set-if-any-not-eq!

set-if-match! / set-if-any-match!

set-if-not-match! / set-if-any-not-match!

set-by-case-eq!

set-by-case-match!

Transform Directives

prefix!

append!

strip!

replace!