Tree Sitter

The Tree Sitter library is used to create parsers which can provide rich syntax trees for languages both simple and complex.

Tree Sitter is the preferred method for adding new languages as of Nova 10 and later. If you would like to support older versions of Nova, use a Regex grammar.

The Tree Sitter project itself provides four core pieces:

Of these four, Nova handles the third (parsing) itself using parsers provided by extensions, and facilitates the fourth (introspection) using queries provided by extensions.

Getting Started

For those that have never used Tree Sitter before to those who are intimately familiar with it, these steps are the same. This documentation will provide a brief overview of the process, and then move in-depth with each step.

To integrate a new language based on Tree Sitter into Nova, the following steps must be taken:

  1. Write a grammar (or adopt an existing one)
  2. Generate a parser from your grammar with Tree Sitter’s CLI tool (tree-sitter generate)
  3. Take your generated parser files and compile them into a native library
  4. Include your compiled library in a Nova extension
  5. Write queries to connect your parser up to syntax highlighting, symbolication, and other editor features

Each time you make changes to your grammar, you will need to follow these steps to generate, compile, and include those changes in your extension, as well as tweak any queries to account for those changes.

Requirements

To develop a grammar and parser, you will need:

For a C compiler toolchain, we recommend installing Apple’s Xcode, as it contains everything in an easy-to-install package you might need. You can install the application bundle, or just the command line tools by using xcode-select --install. You can also install Clang and LLVM directly via Homebrew.

Writing a Grammar

Tree Sitter grammars are created by combining sets of parsing rules written in JavaScript.

If you are not writing your own grammar (or making modifications to an existing one), you can likely skip this step.

The Tree Sitter documentation is the best source for learning how a grammar is structured and what library features you can use to get it just right. In short, you will have a grammar.js file in your project folder which will contain all the necessary rules for parsing text in your language.

We recommend looking at some existing grammars for a better understanding of how they are constructed and maintained. Some good examples are any of the first-party grammars which are part of the Tree Sitter project:

Generating a Parser

Once you have a grammar, you can use it to generate a parser using the Tree Sitter CLI.

Every generated parser will have a name. By default, the Tree Sitter CLI will use the name of the folder in which the project lives if it has the format tree-sitter-(name) (e.g. if your project directory is named tree-sitter-rust/, it will generate a parser with the name rust). This name is important for several reasons which come up in later steps, and using this format for your project folder is common convention with Tree Sitter.

From the top-level folder of your parser project, run:

tree-sitter generate

As long as there are no errors in your grammar, this should create the file src/parser.c (and likely others alongside).

The parser.c file will contain an exported function named tree_sitter_(name)(), where (name) is the parser’s name.

Some grammar projects may use C++ instead of C for their generated parser. This is fine, so long as you end up with a parser.cpp file in the same way as a C parser.

Parsers may also utilize an external scanner. This will generate a scanner.c (or scanner.cpp) file which will need to be included when compiling the parser.

For more information, see the Tree Sitter documentation.

Compiling a Parser

Once a parser has been generated it must be compiled into a native dynamic library for use in Nova.

Most Tree Sitter parsers can be set up to use Make or CMake for compiling native binaries. The first-party parsers from the Tree Sitter project each contain a Makefile.

If your project is not set up with a build tool such as Make, we provide a simple 🛠Parser Build Script to get you started. This includes both a build script as well as an example Makefile which should work with most parsers out of the box should yours not already include one. The Makefile can be placed in the top-level of your project folder (alongside your grammar.js). To use this build tool: ./compile_parser.sh path/to/your/project/ path/to/Nova.app.

The following requirements must be satisfied when compiling a native dynamic library for use in Nova:

Once you have compiled your parser, you should have a dynamic library named something like libtree-sitter-(name).(majorVer).(minorVer).dylib (the exact filename will depend on the build tool and flags used). For example, from a Swift parser this might be named libtree-sitter-swift.0.3.dylib.

This dylib is the only piece of the compiled parser you will need for your extension.

Including a Parser in Extensions

Copy your parser’s dynamic library into the Syntaxes/ folder of your extension and rename it using the following format: libtree-sitter-(name).dylib, where name is the same parser name mentioned in Generating a Parser.

In your syntax XML file, add the tree-sitter element:

<tree-sitter></tree-sitter>

By default, this instructs the syntax engine to look for a Tree Sitter parser alongside the XML file, named the same as the syntax’s name attribute. For a syntax named mylang, it will look for libtree-sitter-mylang.dylib exporting a function named tree_sitter_mylang().

If for some reason you can’t use the same name for your syntax and the parser (such as if you’re using a shared parser for multiple syntaxes), you can specify it manually by using <tree-sitter language="(name)">, where (name) is the parser name as mentioned in Generating a Parser, such as language="mylang".

Note: the name used for the parser, whether it be taken from the name of the syntax or manually specified, must match both the dynamic library included in your extension and the function it exports.

The tree-sitter element is the primary point from which you will specify which Queries your language supports for powering editor features.

A note about Gatekeeper: If you distribute your extension outside of Nova’s extension library, users may encounter a Gatekeeper alert when Nova loads the Tree Sitter dynamic library due to quarantine flags placed on the extension bundle. To resolve this, users should clear the quarantine flags: xattr -d com.apple.quarantine path/to/MyExtension.novaextension.

Security Considerations

As Tree Sitter grammars are compiled into native dynamic libraries, there may be increased concern around security.

Third-party Tree Sitter parsers contributed by extensions are loaded into a secondary XPC service process by Nova separate from the main IDE (as well as separate from all other extension code). Interactions with a parser are then performed using XPC inter-process communication with checks enforced onto any messages being passed back and forth to ensure data consistency and safety.

This process, named NovaParseService, is sandboxed to prevent it from accessing the filesystem, network, and other sensitive areas of the system. Parsers should only have access to the contents of documents being parsed and no other significant data from the user’s project workspace. All third-party parsers are loaded into the same running XPC service.

For signing of the dynamic library binary, Apple’s built-in ad-hoc signing is sufficient, but the binary may also be signed by a Mac developer certificate. The parse service enforces the Hardened Runtime security features of macOS, but does not enforce a valid code signing identity when loading parsers (Library Validation).

Queries

The Tree Sitter query language is used to target specific nodes in a syntax tree using a simple Scheme-like syntax.

Nova uses files written in the query language to power most features that use a Tree Sitter parser, such as syntax highlighting and symbolication. Syntax highlighting and limited autocomplete support is built-in to Nova for files written in the query language.

The query language is composed of three main parts:

For more information on the syntax of the query language, see the query language docs.

Queries in a Nova extension are stored within files in the Queries/ folder in the top-level of your extension. Query files should most often have the scm extension (a holdover from Scheme), and their names follow conventions depending on their purpose. See the following query sections for more details.

Each type of query supported by Nova has a corresponding XML element that can be specified as a child of the tree-sitter element to indicate that the extension supports that feature.

<tree-sitter>
    <highlights />
    <symbols />
    <folds />
</tree-sitter>

By default, each query element will look for a specific filename if no path for the query is specified.

If you need to break up each type of query into multiple files (such as for ease of editing) you can specify more than one copy of a query element and a path attribute on each to denote which files to load. These can even be stored in subfolders for easier organization. Query paths are always relative to the Queries/ folder.

<tree-sitter>
    <highlights path="highlights/types.scm" />
    <highlights path="highlights/functions.scm" />
    <symbols />
    <folds />
</tree-sitter>

Captures

Captures in a query are specified using an @ symbol followed by an identifier consisting of only alphanumerics, underscores, dashes, and periods. Some examples: @foo, @foo.bar, @foo_bar, @foo-bar.

((call_expression
  (function_name) @_function) @subtree
 (#match? @_function "(?i)^(hsl|hsla|rgb|rgba|color)$")   
)

Each category of query handles captures in a specific way, and which names might be “special” is different depending on the query type.

However, any capture name beginning with an underscore (such as @_foo) is guaranteed to be reserved for your extension’s use and will not conflict with any “special” capture names. Consider using underscores for captures that you might be using when filtering queries using predicates which otherwise do not need to be exposed.

Syntax Highlighting

The most common (and definitely most required) query of any language extension is support for syntax highlighting, or coloring of tokens in a document to indicate their meaning.

Syntax highlighting support is provided using one or more query files using the highlights element:

<tree-sitter>
    <highlights />
</tree-sitter>

By default, specifying the highlights element without a path attribute will tell the syntax engine to look for a file named highlights.scm within your extension’s Queries/ folder.

Note: many Tree Sitter grammar projects have a convention for a queries/highlights.scm file used with the tree-sitter parse CLI command and as a basis for other tools. While these can be used for Nova’s syntax highlighting support, they do not use the same highlighting selectors, and as such are not an immediate drop-in solution and will require tweaking if included in a Nova extension.

Syntax highlighting queries apply theming selectors to the document using captures. Each capture specified on a node will be parsed as a selector (as long as it does not begin with an underscore).

Examples

Some simple examples for HTML tags and comments:

(tag_name) @tag.name
["<" ">" "</" "/>"] @tag.bracket
(doctype) @processing.doctype
(comment) @comment

A complex example for HTML attributes:

((attribute
    (attribute_name) @tag.attribute.name
    ["="]? @tag.attribute.operator
    [
      (attribute_value) @tag.attribute.value
      (quoted_attribute_value
        ["\"" "'"] @tag.attribute.value.delimiter.left
        (_)? @tag.attribute.value
        ["\"" "'"] @tag.attribute.value.delimiter.right
      )
    ]?
  )
  (#not-match? @tag.attribute.name "(?i)^(src|href)$")
)

Symbolication

Symbolication is the process of taking specific nodes from the syntax tree and building a list of user-visible “symbols” that are the major structural components of the file. For procedural languages, this will likely be things like types, functions, etc. For a language like HTML, this is likely important tags.

Symbolication support is provided using one or more query files using the symbols element:

<tree-sitter>
    <symbols />
</tree-sitter>

By default, specifying the symbols element without a path attribute will tell the syntax engine to look for a file named symbols.scm within your extension’s Queries/ folder.

Within a symbols query, there are special capture names and variables which are used to mark the specific region of text to symbolicate as well as how the resulting symbol should behave.

Capture Description
@name Marks the “name” of the symbol for indexing and completions
@name.target The node to target for the name query
@displayname Marks the “display name” of the symbol shown in symbol lists
@displayname.target The node to target for the display name query
@arguments.target The node to target for the arguments query
@subtree Use this node and its subtree as the symbol
@start Begin the symbol after this node
@start.before Begin the symbol before this node
@end End the symbol before this node
@end.after End the symbol after this node
Variable Description Value Type
role The type of symbol String
name.query The query to run to build a name String
displayname.query The query to run to build a display name String
arguments.query The query to run to build arguments String
scope.byLine Whether the symbol should align to line boundaries None
scope.extend Whether to expand the symbol to the end of its parent None
scope.level The level of the symbol when it cannot be determined automatically Number
scope.group The group name used when building a symbol using multiple queries String
scope.groupByName Use the symbol’s name for grouping instead of scope.group None
autoclose.expression The expression typed to invoked autoclosing String
autoclose.completion The expression expanded when autoclosing String

Symbol Roles

Each symbol should define its type, or Role. This determines which icon is shown for a symbol, how it is used in autocomplete, etc.

(#set! role function)

The set of valid roles for a symbol are those available to the Symbol API.

Additionally, there are special symbol roles which can be specified to handle certain cases:

Role Description
function-or-method Make the symbol a function or method, depending on its ancestors

Symbol Name

A symbol’s name is its canonical identifier, used when searching via Quick Open, Jump to Definition, etc. It is generally the name which appears directly in a document’s text.

A symbol’s name can be specified in one of two ways. You can use the @name capture to mark the node whose textual contents should be used for the symbol’s name. If the @name capture is specified more than once, or captures multiple nodes in a single query match, the name will be constructed from a union of the matched text ranges.

Name Query

Alternatively, you can specify the @name.target capture and name.query variable, which marks a node that will be used for building a name using an additional query. When using a name query, the name.query variable specifies a query path relative to your extension’s Queries/ folder which will be evaluated on the targeted node (and its subtree).

An example of specifying use of a name query for HTML tags:

((element
  (start_tag (tag_name) @name) @start.before @displayname.target
  (end_tag)? @end.after)
 (#set! displayname.query "tagName.scm")
)

The name subquery has its own set of captures available:

Captures Description
@result Collects textual content for query’s result

The @result capture is used to collect textual components. Each node matched by this capture will be appended to the resulting display name.

The @result capture can also then be targeted by Transform Predicates to further modify its contents.

This result will then be passed back to the symbol query which invoked it to provide the symbol’s name.

Symbol Display Name

While a symbol’s name is used within a document, its display name is what is displayed in the Symbols list. The display name can be richer and convey more information to improve user experience.

A symbol’s display name can be specified in one of two ways. You can use the @displayname capture to mark the node whose textual contents should be used for the symbol’s name. If the @displayname capture is specified more than once, or captures multiple nodes in a single query match, the name will be constructed from a union of the matched text ranges.

Display Name Query

Alternatively, you can specify the @displayname.target capture and displayname.query variable, which marks a node that will be used for building a display name using an additional query. When using a display name query, the displayname.query variable specifies a query path relative to your extension’s Queries/ folder which will be evaluated on the targeted node (and its subtree).

An example of specifying use of a display name query for HTML tags:

((element
  (start_tag (tag_name) @name) @start.before @displayname.target
  (end_tag)? @end.after)
 (#set! displayname.query "tagDisplayName.scm")
)

The display name subquery has its own set of captures available:

Captures Description
@result Collects textual content for query’s result

The @result capture is used to collect textual components. Each node matched by this capture will be appended to the resulting display name.

The @result capture can also then be targeted by Transform Predicates to further modify its contents.

This result will then be passed back to the symbol query which invoked it to provide the symbol’s display name.

An example display name query for HTML tags:

; Tag name
(tag_name) @result

; ID attributes, formatted to "#value"
((attribute
  (attribute_name) @_attrname
  [
    (attribute_value) @result
    (quoted_attribute_value ["\"" "'"] (_)? @result ["\"" "'"])
  ]?
 )
 (#match? @_attrname "(?i)^id$")
 (#prefix! @result "#")
)

; Class attributes, formatted to ".value"
((attribute
  (attribute_name) @_attrname
  [
    (attribute_value) @result
    (quoted_attribute_value ["\"" "'"] (_)? @result ["\"" "'"])
  ]?
 )
 (#match? @_attrname "(?i)^class$")
 (#replace! @result "\\s+" ".")
 (#prefix! @result ".")
)

This query has three components, to combine a tag’s name, ID (if any), class(es) (if any).

For the tag:

<div id="foobar" class="left static"></div>

…this will result in a computed display name of div#foobar.left.static shown in the Symbols list.

Symbol Region

When defining the region of a symbol, either a single discrete node or a pair of nodes must be specified.

The @subtree capture will target a specific node and its subtree. This forms the symbol’s entire bounds.

To instead target a region between two nodes, use the @start / @start.before and @end / @end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @start, the last matched will be used, and if using @end, the first matched will be used).

((element
  (start_tag (tag_name) @name) @start.before
  (end_tag)? @end.after)
)

If only a @start (or @start.before) capture is specified, the scope.extend variable can be set to indicate that the “end” of the symbol should be wherever its parent ends (or the document ends, in the case of no parent).

When a symbol is constructed, it will automatically be grouped into any symbol which comes before if they intersect.

In cases where this is not possible, you can define the scope.level variable. This is useful in, for example, Markdown documents where symbols for sections might be built from a syntax tree that does not itself define depth of its sections. The value should be a number (starting with 1) representing the “depth” of the symbol. When the tree of symbols is constructed, symbols with a level are automatically grouped into any symbol coming before it so long as the previous symbol’s level is lower.

((atx_heading
  .
  (atx_h1_marker)) @start.before
  (#set! role heading)
  (#set! scope.level 1)
  (#set! scope.extend)
)
((atx_heading
  .
  (atx_h2_marker)) @start.before
  (#set! role heading)
  (#set! scope.level 2)
  (#set! scope.extend)
)
; ...etc.

If the scope.byLine variable is set, the region will automatically have its boundaries aligned with the lines of text it intersects. When this happens, the last line intersecting the symbol’s end will be excluded. This may be useful in certain languages, like Markdown, where the end token targeted for symbolication and anything else on the line should not be included in the symbol.

(function_definition_statement
  name: (_) @start.before
  ")"? @start.before
  "end" @end
 (#set! scope.byLine)
)

For complex cases, symbols can also be constructed from multiple queries when a single query cannot accurately target the required set of nodes. By using the scope.group variable with a string value you can denote that one query using @start (or @start.before) should be linked with another query using @end (or @end.after). When two matched queries have the same group name and appear within the same level of the tree of symbols, they will be combined. The names of groups is not important, so long as the two halves of the region match.

When grouping many symbols of the same type but differing depths, it may not be possible to accurately “name” all of the groups in your query directly. If both the start and end query building the symbol knows the symbol’s “name”, you can specify scope.groupByName, and the query will instead use the symbol’s name as the group name. An example of this in practice is HTML tags: both the start and end tag know their name, so they can be grouped automatically using it.

((jsx_opening_element
  name: [
    (identifier)
    (nested_identifier)
  ] @name) @start.before
  (#set! role tag)
  (#set! scope.groupByName)
)

Arguments Query

For function-like symbols which have arguments, a query can instruct the syntax engine to symbolicate arguments for use in signature help and autocomplete.

By using the @arguments.target capture, the query marks a node that will be used for building arguments. The arguments.query variable then specifies a query path relative to your extension’s Queries/ folder which will be evaluated on the targeted node (and its subtree).

An example of symbolicating arguments with a TypeScript method definition:

((method_definition
    name: (property_identifier) @name
    parameters: (formal_parameters) @arguments.target) @subtree
  (#set! arguments.query "arguments.scm")
)

The arguments subquery has its own set of captures available:

Captures Description
@name The argument name
@type The argument type

Each match of the query within the targeted subtree will create an argument that can be autocompleted when the user completes the function-like symbol’s name, or displayed when the user invokes signature help.

Autoclosing

Certain symbols may wish to support “autoclosing,” or the behavior in which typing a certain expression while inside of the symbol’s region will “close” the symbol by expanding out additional characters. An example of this is HTML tags, where typing the expression </ will automatically expand the name of the tag and closing > without the user needing to type them.

To add autoclosing support to a symbol, two variables are used: autoclose.expression and autoclose.completion. The first is used to specify what the user must type to invoke autoclose while inside of the symbol’s region. Once this happens, the autoclose.completion string will be expanded and inserted at the cursor position.

The completion expression supports string expansion using the ${variable} format and the following expression variables:

Expression Variable Description
name The symbol’s name

An example of using autoclosing with HTML tags:

((element
  (start_tag (tag_name) @name) @start.before @displayname.target
  (end_tag)? @end.after)
 (#set! displayname.query "tagDisplayName.scm")
 (#set! autoclose.expression "</")
 (#set! autoclose.completion "${name}>")
)

In this case, when the user’s cursor is within a div tag and they type </, the autoclose completion expression will be expanded into div> and inserted, resulting in the proper end tag behind the cursor position.

Folds

Fold queries define the boundaries on which automatic code folding support is provided in Nova’s editor.

Folding support is provided using one or more query files using the folds element:

<tree-sitter>
    <folds />
</tree-sitter>

By default, specifying the folds element without a path attribute will tell the syntax engine to look for a file named folds.scm within your extension’s Queries/ folder.

By default, any comments parsed using the comment syntax highlighting selector will automatically be made foldable by Nova’s editor alongside any folds defined by an extension. Comments do not explicitly need to be made foldable by folding queries unless they do not conform in this way.

Within a folds query, there are special capture names and variables which are used to mark the specific region of text to make foldable as well as how the region should behave.

Capture Description
@subtree Use this node and its subtree as the foldable region
@start Begin the foldable region after this node
@start.before Begin the foldable region before this node
@end End the foldable region before this node
@end.after End the foldable region after this node
Variable Description Value Type
role The type of folding region String
scope.byLine Whether the region should align to line boundaries None
scope.extend Whether to expand the region to the end of its parent None
scope.level The level of the fold when it cannot be determined automatically Number
scope.group The group name used when building a region using multiple queries String

Folding Roles

Each fold can optionally define a Role. These roles are used when certain editor folding actions are taken, such as if a user invokes “Fold All Functions & Methods”. By defining a fold as a function, it will be included.

(#set! role function)

Roles Example
comment Documentation comments
block A logical block, such as an “if” statement
function Functions, method, etc.
heading Section, such as Markdown headings
tag HTML tags
type Classes, interfaces, etc.

Foldable Region

When defining the region of foldable text, either a single discrete node or a pair of nodes must be specified.

The @subtree capture will target a specific node and its subtree as foldable. This forms the fold’s entire region.

To instead target a region between two nodes, use the @start / @start.before and @end / @end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @start, the last matched will be used, and if using @end, the first matched will be used).

((element
  (start_tag) @start
  (end_tag) @end)
 (#set! role tag)
)

If only a @start (or @start.before) capture is specified, the scope.extend variable can be set to indicate that the “end” of the region should be wherever its parent ends (or the document ends, in the case of no parent).

When a fold is constructed, it will automatically be grouped into any fold which comes before if its region intersects.

In cases where this is not possible, you can define the scope.level variable. This is useful in, for example, Markdown documents where each fold might be built from a syntax tree that does not itself define depth of its sections. The value should be a number (starting with 1) representing the “depth” of the fold. When the tree of folds is constructed, folds with a level are automatically grouped into any fold coming before it so long as the previous fold’s level is lower.

((atx_heading
  .
  (atx_h1_marker)) @start
  (#set! role heading)
  (#set! scope.level 1)
  (#set! scope.extend)
)
((atx_heading
  .
  (atx_h2_marker)) @start
  (#set! role heading)
  (#set! scope.level 2)
  (#set! scope.extend)
)
; ...etc.

If the scope.byLine variable is set, the region will automatically have its boundaries aligned with the lines of text it intersects. This is useful for languages which use textual boundary tokens (such as Ruby) where you might define a foldable region between the tokens func and end. By default, everything in between is folded, leaving the fold marker situated between these two tokens on the same line: func[]end. This is awkward and not really ideal. By specifying scope.byLine, the editor will automatically ensure that any trailing newline before the final token is excluded from the fold, leaving the end token on the next line.

(function_definition_statement
  name: (_) @start
  ")"? @start
  "end" @end
 (#set! scope.byLine)
)

For complex cases, folds can also be constructed from multiple queries when a single query cannot accurately target the required set of nodes. By using the scope.group variable with a string value you can denote that one query using @start (or @start.before) should be linked with another query using @end (or @end.after). When two matched queries have the same group name and appear within the same level of the tree of folds, they will be combined. The names of groups is not important, so long as the two halves of the region match.

((php_tag) @start
 (#set! role tag)
 (#set! scope.group php_tag)
)
((text_interpolation "?>") @end
 (#set! scope.group php_tag)
)

Injections

Injections allow for a language to mark regions of a document which should be parsed as another language by the editor (also known as “code fences”). Examples of this include script and style tags in HTML and triple-backtick blocks in Markdown.

Injection support is provided using one or more query files using the injections element:

<tree-sitter>
    <injections />
</tree-sitter>

By default, specifying the injections element without a path attribute will tell the syntax engine to look for a file named injections.scm within your extension’s Queries/ folder.

Within an injection query, there are special capture names and variables which are used to mark the specific region of text to reparse as well as what language to use.

Capture Description
@injection.content Use this node and its subtree as the injected region
@injection.content.start Begin the injected region after this node
@injection.content.start.before Begin the injected region before this node
@injection.content.end End the injected region before this node
@injection.content.end.after End the injected region after this node
@injection.language Use the textual content of this node for the injected language
Variable Description Value Type
injection.combined Whether multiple injected regions should be considered one None
injection.language The injected language String
injection.reset Instruct syntax highlighting to reset attributes for the region String

Injected Language

When defining which language is used to parse the region, you can either specify the injection.language variable (using Variable predicates) or use the @injection.language capture to specify a node whose textual content is parsed (useful for code fences where a language identifier is directly specified in the document).

The language identifier specified is parsed using Injection regular expressions to allow for potential differences in language identifier formats depending on the parent language. If no injection regular expression matches, the syntax engine checks if any syntax is registered whose name is the same as the identifier.

Content Region

When defining the region of text to reparse for an injection, either a single node or separate start and end nodes should be specified.

The @injection.content capture will mark a specific node and its subtree as the content region.

To instead target a region between two nodes, use the @injection.content.start / @injection.content.start.before and @injection.content.end / @injection.content.end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @injection.content.start, the last matched will be used, and if using @injection.content.end, the first matched will be used).

Combined Regions

Documents which contain multiple disparate regions of an injected language may wish to convey that they are the same logical region (e.g. in a language like PHP, every PHP template tag is linked into one common “php document” and share the same scope).

By setting the injection.content variable using a variable predicate, the syntax engine will combine multiple injected regions existing at the same level of the parse tree into one logical unit and parse the contents of all of them together. Otherwise, each will be parsed separately as if it were its own document.

(#set! injection.combined)

Examples

Examples from the Markdown language extension:

; Yaml frontmatter
(document
  .
  (thematic_break) @injection.content.start
  .
  (setext_heading
    (setext_h2_underline) @injection.content.end)
 (#set! injection.language yaml)
)

; HTML blocks
(html_block
 (#set! injection.language html)
) @injection.content

; Fenced code blocks
(fenced_code_block
  (info_string (text) @injection.language)
  (code_fence_content) @injection.content)

Text Checking

Languages that make heavy use of prose may wish to include support for automatic text checking support, which performs operations like spell checking and automatic URL detection. This is often used in languages like HTML and Markdown for human-readable text as well as in many procedural languages for documentation comments.

By default, Nova’s editor will scan any region syntax highlighted with the comment selector to be included in text checking. This means that most languages likely don’t have to do anything to support it.

Text checking support can be manually overridden using one or more query files using the text-checking element:

<tree-sitter>
    <text-checking />
</tree-sitter>

By default, specifying the text-checking element without a path attribute will tell the syntax engine to look for a file named textChecking.scm within your extension’s Queries/ folder.

Within a text checking query, there are special capture names and variables which are used to mark the specific region of text to make checkable.

Capture Description
@subtree Use this node and its subtree as the checkable region
@start Begin the checkable region after this node
@start.before Begin the checkable region before this node
@end End the checkable region before this node
@end.after End the checkable region after this node

The @subtree capture will target a specific node and its subtree as checkable. This forms the entire region.

To instead target a region between two nodes, use the @start / @start.before and @end / @end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @start, the last matched will be used, and if using @end, the first matched will be used).

Colors

Extensions that make use of the Colors API can provide a query to automatically detect potential color values in the document to be passed to the extension for further processing.

Any nodes matched by color queries will be collected by the editor and provided in the ColorInformationContext object’s candidates property when a color request is made.

Colors support can be provided using one or more query files using the colors element:

<tree-sitter>
    <colors />
</tree-sitter>

By default, specifying the colors element without a path attribute will tell the syntax engine to look for a file named colors.scm within your extension’s Queries/ folder.

Within a colors query, there are special capture names and variables which are used to mark the specific region of text to pull out as a color candidate.

Capture Description
@subtree Use this node and its subtree as the candidate
@start Begin the candidate after this node
@start.before Begin the candidate before this node
@end End the candidate before this node
@end.after End the candidate after this node

The @subtree capture will target a specific node and its subtree as a candidate.

To instead target a region between two nodes, use the @start / @start.before and @end / @end.after captures. If multiple nodes match one of these captures, the “innermost” match will be used (e.g. if matching a series of keywords using @start, the last matched will be used, and if using @end, the first matched will be used).

Predicates

The Tree Sitter library itself does not provide any concrete predicates. Instead, there are a few commonly-agreed upon predicates used by several tools and applications. Nova implements these as well as many custom predicates which facilitate its features.

Predicates are akin to functions in procedural languages, and take arguments in the same way. The arguments to predicates can either be a Capture or a String Literal.

(#eq? @attr "border:")
(#not-eq? @tag.name div)

Capture arguments are references to nodes captured using the @name syntax on nodes in the query. They are specified in the same way, including the @ symbol.

String literals are constant values wrapped within double quotes (such as "foo-bar"). As a convenience, string literals which consist of only alphanumeric characters, underscores, dashes, and periods are allowed to be specified without quotes (such as foo.bar). This is often used for better readability of special keys and identifiers.

Note: for string literals which specify a regular expression pattern, character classes using the backslash must be double-escaped (such as \\s+) due to a single escape being reserved for escaping the string literal’s characters.

Filter Predicates

Filter predicates are used to further refine whether a query matches a set of nodes at all. Where the query can narrow down matches based on the structure of the tree, filter predicates allow comparing against captured text from the document.

eq?

Filters a query based on whether a capture is equal to a value.

Syntax: (#eq? capture captureOrString)

If two capture names are provided, the textual contents of the nodes those captures denote will be compared.

If a capture name and string literal are provided, the textual contents of the node the capture denotes will be compared to the string.

Examples:

(#eq? @attr "corner-radius")
(#eq? @tag-start-name @tag-end-name)

not-eq?

Filters a query based on whether a capture is not equal to a value.

Syntax: (#not-eq? capture captureOrString)

If two capture names are provided, the textual contents of the nodes those captures denote will be compared.

If a capture name and string literal are provided, the textual contents of the node the capture denotes will be compared to the string.

Examples:

(#not-eq? @attr "corner-radius")
(#not-eq? @tag-start-name @tag-end-name)

match?

Filters a query based on whether a capture matches a regular expression.

Syntax: (#match? capture regex)

The textual contents of the node the capture denotes will be evaluated using the provided regular expression. If the expression matches within the text the predicate will evaluate to true, otherwise it will evaluate false.

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

Examples:

(#match? @attr "[a-zA-Z0-9_]+")
(#match? @attr "^border-(top|left|bottom|right)$")

not-match?

Filters a query based on whether a capture does not match a regular expression.

Syntax: (#not-match? capture regex)

The textual contents of the node the capture denotes will be evaluated using the provided regular expression. If the expression matches within the text the predicate will evaluate to false, otherwise it will evaluate true.

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

Examples:

(#not-match? @attr "[a-zA-Z0-9_]+")
(#not-match? @attr "^border-(top|left|bottom|right)$")

Variable Predicates

Variable predicates are used to set and compare Variables, a set of key-value pairs unique to each instance a query runs on a set of nodes in the tree.

Some variables are set before the query runs and allow the query to check their value during its evaluation, while others are set by a query during its evaluation to hand information back to the syntax engine.

The available variables in any query are dependent on which operation is evaluating the query. See the specific query sections for more information on which variables might be important.

set!

Sets the value of a variable.

Syntax: (#set! variable [captureOrString])

The variable named variable will be set to a value. If a capture is provided, the textual contents of the node the capture denotes will be used. If a string literal is provided it will be used. If no value is provided, the value will be marked as “set” for purposes of checking its state but otherwise will not have a discrete value.

Examples:

(#set! autoclose-expression "</")

set-if-eq!

Sets the value of a variable if a capture is equal to a value.

Syntax: (#set-if-eq! capture captureOrString variable [value])

The textual contents of the node the capture denotes will be compared to either a second capture or a string literal.

If they are equal, the value of the variable variable will be set to the string literal value. If no value is provided, the value will be marked as “set” for purposes of checking its state but otherwise will not have a discrete value. Otherwise, the variable will not be set.

set-if-not-eq!

Sets the value of a variable if a capture is not equal to a value.

Syntax: (#set-if-eq! capture captureOrString variable [value])

The textual contents of the node the capture denotes will be compared to either a second capture or a string literal.

If they are not equal, the value of the variable variable will be set to the string literal value. If no value is provided, the value will be marked as “set” for purposes of checking its state but otherwise will not have a discrete value. Otherwise, the variable will not be set.

set-if-match!

Sets the value of a variable if a capture matches a regular expression.

Syntax: (#set-if-match! capture regex variable [value])

The textual contents of the node the capture denotes will be evaluated using the regular expression.

If the expression matches within the text, the value of the variable variable will be set to the string literal value. If no value is provided, the value will be marked as “set” for purposes of checking its state but otherwise will not have a discrete value. Otherwise, the variable will not be set.

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

set-if-not-match!

Sets the value of a variable if a capture does not match a regular expression.

Syntax: (#set-if-not-match! capture regex variable [value])

The textual contents of the node the capture denotes will be evaluated using the regular expression.

If the expression does not match within the text, the value of the variable variable will be set to the string literal value. If no value is provided, the value will be marked as “set” for purposes of checking its state but otherwise will not have a discrete value. Otherwise, the variable will not be set.

The regular expression is allowed to match anywhere within the capture text. To match on specific ends or the entire textual contents of the node, use anchoring metacharacters such as ^ and $.

set-by-case-eq!

Sets the value of a variable if a capture is equal to a value in one of a series of switch cases.

Syntax: (#set-by-case-eq! capture variable [captureOrString value]… [default])

For each case pair, either a capture or string literal is provided as the first half (captureOrString) and a string literal for the second half (value).

The textual contents of the node the capture denotes will be compared to the first value of each case pair, in order. If they are equal, the value of the variable variable will be set to case’s value and the predicate will stop evaluating cases.

If none of the cases match, an optional default string literal value may be specified to be set. Otherwise, the variable will not be set.

Examples:

(#set-by-case-eq! @name scope.level
    "h1" 1
    "h2" 2
    "h3" 3
    "h4" 4
    "h5" 5
    "h6" 6
)

set-by-case-match!

Sets the value of a variable if a capture matches a regular expression in one of a series of switch cases.

Syntax: (#set-by-case-eq! capture variable [captureOrString value]… [default])

For each case pair, a regular expression is provided as the first half (regex) and a string literal for the second half (value).

The textual contents of the node the capture denotes will be evaluated using the regular expression of each pair, in order. If the expression matches within the text, the value of the variable variable will be set to case’s value and the predicate will stop evaluating cases.

If none of the cases match, an optional default string literal value may be specified to be set. Otherwise, the variable will not be set.

Examples:

(#set-by-case-match! @name role
    "(?i)^(h1|h2|h3|h4|h5|h6|header|hgroup)$" tag-heading
    "(?i)^(article|aside|main|nav|section)$" tag-section
    "(?i)^(a)$" tag-anchor
    "(?i)^(link)$" tag-link
    "(?i)^(img)$" tag-image
    tag
)

is?

Filters a query based on whether a variable is equal to a value.

Syntax: (#is? variable [value])

The value of the variable variable will be compared to the string literal value. If no value is provided, the variable will be equal if it is set at all (to any value, including the absence of a value).

is-not?

Filters a query based on whether a variable is not equal to a value.

Syntax: (#is-not? variable [value])

The value of the variable variable will be compared to the string literal value. If no value is provided, the variable will be equal if it is not set at all (to any value, including the absence of a value).

Transform Predicates

Transform predicates operate on text collected by captures. They can take the textual contents of a captured node and progressively change it using a series of string operations.

They are evaluated in the order they appear in the query (depth within braces having no bearing). These predicates are only used in certain operations which pull out text from captures, such as in naming a symbol during symbolication.

Transformations are destructive to the capture’s ultimate resulting text. If multiple transform predicates are evaluated on the same capture, the first will receive the original textual contents, the second will receive the result of the first, and so on.

This does not affect the operation of Filter predicates, which always use the original textual contents of the capture before transformation. Transform predicates only affect the result handed back to the syntax engine once the query fully evaluates.

prefix!

Prefixes the textual result of a capture with one or more other captures or string literals.

Syntax: (#prefix! capture captureOrString...)

The current text of the capture will be prefixed with the provided arguments, in order. Each additional argument may be either a capture name or string literal. If a capture is specified its current textual result will be used.

Performing multiple prefix predicates on the same capture will continue to prefix onto the beginning of the previous result.

Examples:

; Before: @tag-id == "myTag", @tag-name == "div"
(#prefix! @tag-id @tag-name "#")
; After: @tag-id == "div#myTag"

append!

Appends onto the textual result of a capture with one or more other captures or string literals.

Syntax: (#append! capture captureOrString...)

The current text of the capture will be appended with the provided arguments, in order. Each additional argument may be either a capture name or string literal. If a capture is specified its current textual result will be used.

Performing multiple append predicates on the same capture will continue to append onto the end of the previous result.

Examples:

; Before: @attr-name == "border-color", @attr-value == "red"
(#append! @attr-name ": " @attr-value ";")
; After: @tag-name == "border-color: red;"

strip!

Strips text from the beginning and end of the textual result of a capture which matches a provided regular expression.

Syntax: (#strip! capture regex)

Text matching the regular expression anchored at both the beginning and end of the current text of the capture will be stripped, leaving only what text did not match in between. This can be useful in stripping whitespace, for example.

Examples:

(#strip! @attr-name "\\s+")

replace!

Replaces text in the textual result of a capture which matches a provided regular expression using a replacement template expression.

Syntax: (#replace! capture regex template)

Each block of text in the current text of the capture which matches the regular expression will be replaced using the provided template expression. The expression may utilize backreferences using the \# syntax, where # is the number starting with 1 of a capture reference in the original regular expression, with \0 representing the entire match.

Examples:

(#replace! @attr-name "([a-zA-Z0-9]+)\\s+\{" ".\1")