Symbols

The process of taking a parse tree built by a syntax grammar in the editor and forming a list of “Symbols” is known as Symbolication. This allows the elements of a parse tree to form logical “higher-level” Symbols that refer to language constructs that should appear in places such as the Symbols list and be used to power IDE features such as Jump To Definition.

Symbolication in the Nova parse engine is achieved by adding additional metadata to syntax grammar Scopes using a <symbol> element. The presence of this element declares that a scope either defines a Symbol, or adds metadata to an already existing Symbol.

Basic Symbols

The most basic of <symbol> elements defines attributes about its type:

<scope name="javascript.definition.class">
	<symbol type="class" />
	<starts-with>
		<expression>\b(class)\b</expression>
		<capture number="1" name="javascript.keyword.class" />
	</starts-with></scope>

Defining a Symbol’s Type

In this example, a scope that parses JavaScript classes is marked using a symbol element that has its type set to class, indicating to the parse engine that this scope defines a Class symbol.

Types are provided for symbolic constructs common to most procedural, structured, and markup languages. Valid values are defined in the Symbol documentation.

Note: not all symbol types will appear in the Symbols list in the IDE. This list is filtered to specific types to ensure the most relevance to users.

Defining a Symbol’s Scope

The scope attribute of the <symbol> element may be used to define the lexical scope in which the symbol is valid (the use of the term “scope” here is an overloaded term: this should not be confused with “scopes” used in the syntax grammars.)

The scope of a symbol can affect how it is offered in completions and global project indexing.

Valid values for the scope attribute are:

Most often, the constructs of a language defines the scope of symbols using generally understood rules:

If no scope is defined on a symbol, it will be inferred using rules like those above. If no scope can be inferred, it will be assumed to be local.

Additionally, a symbol may be marked using the anonymous attribute (set to true) to indicate that it does not export a name that should be indexed anywhere for completion, even if it has a name in its local scope. This is most often used for anonymous functions in languages like JavaScript and Python.

There are, however, no strict rules enforced for the scope of a symbol in relation to where it is defined; This allows a syntax grammar to tune symbols to the scope in which they make most sense with regard to completion and indexing.

Consider the following definition for a JavaScript arrow function:

<scope name="javascript.definition.function.arrow">
	<symbol type="function" scope="local" anonymous="true" /></scope>

Since JavaScript arrow functions are anonymous functions, the scope attribute is set to local, which indicates that the symbol should only be valid in its current parent (local) scope.

Computing a Symbol’s Name

When a symbol is created from a <scope> tag, the parser performs a set of heuristics to attempt to determine both syntactically-relevant and user-readable names for it.

By default, if no other options are specified, the parser will look through the symbol’s children for any <scope> or <capture> elements that contain the class name in their name. If one is found, it will be assumed that it represents the name of the symbol.

Consider this example, from the XML syntax:

<scope name="xml.tag.open">
	<symbol type="tag" />
	<starts-with>
		<expression>&lt;([a-zA-Z_][A-Za-zÀ-ÖØ-öø-ÿ0-9_:.-]*)</expression>
		<capture number="1" name="xml.tag.name" />
	</starts-with></scope>

The scope is marked for symbolication with the type tag. No other options are specified, so the parser will search the scope’s starts-with expression for a capture that includes the class name, which it will find, defined as capture group 1. The name for this symbol will then be constructed by referencing what text was captured by this group.

The name of a symbol will be used when inserting symbols using completions, when searching for them using the project index, and more. It is important to ensure that the name of symbols is being properly constructed by the parser to ensure proper behavior in the language grammar.

Computing a Symbol’s Display Name

It is often the case that the user-displayable name for a symbol (such as in the Symbols list) needs to convey more information about the symbol that just its simple syntactic name.

In HTML, for example, a symbols list that contains only a list of the word div is not super helpful. For this, the parser supports complex expressions to build a Display Name.

The <display-name> element of the <symbol> element allows for this.

This element contains one or more <component> elements that pull pieces of the symbol’s subtree in the same way as it computes the name, which are then concatenated together in specific ways.

Component elements may have the following attributes:

Consider this example from the HTML syntax:

<scope name="html.tag.open.paired" spell-check="false" lookup="documentation">
	<symbol type="tag-heading">
		<display-name>
			<component variable="name" />
			<component selector="tag.attribute.value.id" prepend="#" />
			<component selector="tag.attribute.value.class" prepend="." replace="\s+" replace-with="." />
		</display-name>
		<context behavior="start" group-by-name="true" unclosed="parent">
			<auto-close string="&lt;/" completion="${name}&gt;" />
		</context>
	</symbol>
	<starts-with>
		<strings prefix="&lt;" suffix="\b" word-boundary="false" case-insensitive="true">
			<string>h1</string>
			<string>h2</string>
			<string>h3</string></strings>
		<capture number="1" name="html.tag.name" />
	</starts-with></scope>

The <display-name> element in this symbol has three <component> elements within:

When these three computed components are joined, the result will be the display name `h1#myheading.foo.bar”, which is a nicely specific descriptor for the user.

Parsing Arguments

For certain types of symbols, such as functions and methods, the parser can automatically symbolicate arguments in such a way that they can be used in completions when inside the function, or invoking the function in other code.

By default, argument parsing is enabled for functions and methods. It can be enabled for any other symbol type in which it makes sense by using the arguments attribute set to true on the <symbol> element. Likewise, it can be disabled for functions and methods using the value false.

When argument parsing is enabled, the parser will enable a special type of symbolication using the argument symbol type. This type specifically calls out that the symbol created is an argument to a symbol in its ancestry. If one or more argument symbol is found, it will automatically be parsed as an argument for the parent symbol.

The name of the argument can be further refined by using a subscope or sub-capture with the class name argument.name, much in the same way as computing symbol names. If no name can be found, the entire text of the argument will be used.

<scope name="javascript.identifier.argument.name">
	<symbol type="argument" />
	<expression>(?&lt;!\=)\b[a-zA-Z_][A-Za-zÀ-ÖØ-öø-ÿ0-9_]*\b</expression>
</scope>

Filtering Symbols

When constructing symbols, it is possible that a <scope> element present in the parse tree should only be turned into a symbol if certain textual characteristics are met, such as matching a regular expression.

The <filter> element used within the <symbol> element allows for this.

Consider this example, from XML, where only tags that are not self-closing will generate symbols.

<scope name="xml.tag.open">
	<symbol type="tag">
		<!-- Do not match self-closing tags -->
		<filter match-end="(?&lt;!/&gt;)" />
		<context behavior="start" group-by-name="true">
			<auto-close string="&lt;/" completion="${name}&gt;" />
		</context>
	</symbol>
	<starts-with>
		<expression>&lt;([a-zA-Z_][A-Za-zÀ-ÖØ-öø-ÿ0-9_:.-]*)</expression>
		<capture number="1" name="xml.tag.name" />
	</starts-with></scope>

The <filter> element defines a match-end attribute that is a regular expression pattern that must match at the end of the scope’s text for it to be symbolicated.

Filters can have the following attributes:

Symbolic Contexts

Many symbols, such as classes, functions, and expression blocks define regions of code in their respective language that are self-contained for the purposes of things such as variable resolution. This is most often called “scope” in procedural languages, but for the benefit of the term already being super-overloaded, is referred to as Symbolic Context in the Nova parse engine.

Symbolic Contexts are a special behavior of symbols that allows them to easily define the boundaries of code blocks to power IDE features such as code folding, identifier resolution, and intelligent completion.

Using the <context> element within a <symbol> allows the symbol to describe to the parse engine exactly how to build a symbolic context starting, including, or ending with that symbol.

Consider this example of a JavaScript class:

<scope name="javascript.definition.class">
	<symbol type="class">
		<context behavior="subtree" />
	</symbol>
	<starts-with>
		<expression>\b(class)\b</expression>
		<capture number="1" name="javascript.keyword.class" />
	</starts-with></scope>

The <context> element here defines that the symbol is a symbolic context. This enables features like code folding for it. The behavior attribute determines how the bounds of the text that is contained by the symbol is defined.

Subtree Contexts

If the behavior attribute is defined as subtree (the default), then the symbolic context is completely defined within the subtree of the current <scope> element. The contents of the text parsed within the <scope> element and its subscopes all from the region that is the symbolic context.

Whitespace Contexts

If the behavior is defined as whitespace, then the symbol starts a symbolic context which is then computed based on the whitespace of lines succeeding the symbol. This is most often used in languages such as Python, which uses whitespace for block deliniation.

There are several automatic rules when using this type of context:

Start-Next-End Contexts

Complex symbolic context may be defined using multiple symbols. This is most often utilized when a single symbol, defined by a <scope> element, cannot fully express the boundary of a symbolic context, or when there are multiple parts to a symbolic context chained together.

For this, there are three values for the behavior attribute that define the boundaries of the context:

The simplest combination of these is a single start and end symbol. This can be used to define a single symbolic context, must like using subtree, using two symbols. The second symbol need not even be present in the symbols list by specifying its type as expression. For example, this allows tokens such as end to close the symbolic context of functions and classes in Ruby and other similar languages.

A more complex context can chain together multiple parts, such as the use of if-elseif-else clauses in most procedural programming languages. In this example, the if expression would define a start symbol, elseif would define a next symbol, and else would either define an end symbol or a next symbol (depending on if the language in question had an explicit token construct to end the chain, such as end).

Using these rules, if a symbol appears that is marked with a context behavior of start, the parser will automatically begin looking for symbols marked as next and end, and will combine them together. All available options for of how they are combined may be controlled with the context options specified below.

The simplest way to ensure that the right expressions are always combined together properly is through the use of the group attribute on the <context> element. This defines a name for the context that will be used to match together start, next, and end symbols that appear in a sequence in the parse tree.

Additionally, the group-by-name attribute, set to true, may be used to automatically group together symbols in the same way using the symbol’s name instead of a constant string value.

Symbolic Context Options

The <context> element has several attributes that may be used to configure its behavior:

Auto-closing

Symbolic contexts have the ability to define an auto-closing behavior. This allows the context to be automatically closed if the user begins to type text that matches a specific expression.

An example of this behavior is the ability to automatically close the current HTML tag if a user types the </ expression. If the user were in a <div> tag, the editor would automatically finish out the expression as </div>.

This behavior is enabled and controlled by using the <auto-close> element within <context>.

Options for the auto-close element include:

The completion expression supports variable replacement syntax within ${} brackets. Available variables include:

Consider this example from the HTML syntax:

<scope name="html.tag.open.paired" spell-check="false" lookup="documentation">
	<symbol type="tag-heading">
		<context behavior="start" group-by-name="true" unclosed="parent">
			<auto-close string="&lt;/" completion="${name}&gt;" />
		</context>
	</symbol>
	<starts-with>
		<strings prefix="&lt;" suffix="\b" word-boundary="false" case-insensitive="true">
			<string>h1</string>
			<string>h2</string>
			<string>h3</string></strings>
	</starts-with></scope>

This symbol defines a symbolic context that uses an auto-close when the user types the string </. When that happens, the completion expression ${name}> is appended. Closing an <h1> tag by typing </ will result in the text </h1>.