Skip to content

Design Overview

Nathan Carter edited this page Jun 30, 2020 · 5 revisions

This section gives a bit more detail about the internal structure of the LDE than the brief comments in Project Goals did, but it remains incomplete without the details fleshed out in later sections. (See the outline in the wiki sidebar.) It is called the Design Overview because it gives all the structure at a high level, so that the rest of the wiki can make sense in a context.

The main data structures and algorithms were mentioned briefly in Project Goals, and are illustrated in the figure below. As you can see from the figure, the LDE consists mainly of two data structures, an Input Tree (IT) and and Output Tree (OT). The remainder of this page gives high-level explanations of them and their associated algorithms.

Illustration of the sequence of LDE algorithms

Input Tree

Nodes in the IT are Input Structures (ISs). They are created, removed, or modified when (and only when) the client makes calls to the LDE's API (which will be documented in detail in the API section of the Clients page). Thus the IT is indirectly created from user input, as interpreted by the client, and neither created nor modified in any other way. The LDE creates the OT from the nodes in the IT using the multi-step process shown in the figure above and covered in detail on the Algorithms page.

The IT should be therefore thought of as "what the user said," and the OT as "what the user meant." You might think of them, therefore, as a syntax tree (IT) and a semantics tree (OT). Interpretation, as described on the Algorithms page, is the process of converting from syntax to semantics.

ISs come in two subclasses, Input Expressions (IEs) and Input Modifiers (IMs). IEs are interpreted into meaningful content in the OT and IMs are not. Rather, IMs modify the manner in which IEs are interpreted. They are processed in the first phase of the three-step process shown in the bottom row of the above figure, a phase called Modification, in which the LDE calls the updateConnections() routine in every IM; this guarantees that IMs are connected to the correct target IEs.

IMs cannot create OSs directly (in the interpretation phase described below). They can modify only IEs, not other IMs, and thus the order in which their updateConnections() routines are called in the Modification phase is unspecified and irrelevant. They obey no constraints on tree structure in the sense that they can be located anywhere in the IT and can connect themselves to any IEs located anywhere in the IT. IMs' updateConnections() routines may edit only their own connections to IEs, not any other data in the IT. They cannot connect themselves to other IMs, but only IEs.

For example, an IM representing the user's intent to assign a label to the subsequent three IEs would need, in its updateConnections() routine, to compute what the subsequent three IEs are, in case that set has changed since the last interpretation phase, and connect itself to those three IEs (removing any connections that no longer apply). They do so using the ordinary connections facility documented in the Attributes section of the Ordered Trees page.

In the second phase, called Interpretation, the LDE proceeds recursively through the IT, calling the interpret() routine in every IE that has been altered by the client or the Modification phase, assembling the results into the new OT. As suggested above, the interpret() routines of IMs must either not exist or return an empty list of nodes. Consequently, the root of the IT must be an IE (not an IM), so that this process is guaranteed to produce an OT. During that phase, each IE will typically ask all IMs attached to it to embed within the IE any attribute data that the IMs wish to, but the protocol for doing so is discussed on the Algorithms page, and depends on definitions of attributes in the Attributes section of the Ordered Trees page.

At any point during the Modification or Interpretation phases, the LDE may need to send feedback to the client about some of the user's input and how it has been processed (for example, if the user wrote a syntax error which required the LDE to point it out because it could not interpret it). In the above figure, this concept is shown as the lines labeled as messages back to the client. The specific format of such messages is discussed in the Messages section of the Clients page.

Output Tree

Nodes in the OT are Output Structures (OSs). They represent mathematical meaning in a simple, canonical way defined for the purpose of easy computation and validation.

We store in the OT all the mathematical meaning we can glean from the IT, not just what we need for validation. Some of the user's meaning directs interpretation, some of it may enable or disable validation settings, and so on; not all of it is mathematics to be validated. But OT content has all been converted from its more syntactic form in the IT into a computer-usable form in the OT.

We want the IT to have a form closely related to the user's desired syntax so that clients can be lightweight, not needing to do interpretation themselves. We want the OT to have a form closely related to the meaning of the content, so that validation code in the LDE can be simple, clear, and easy to write, factors that impact its correctness and maintainability.

Each OS subclass $C_O$ can (and usually will) have a corresponding subclass $C_I$ of IE such that interpret() converts instances of $C_I$ into instances of $C_O$. These subclasses of OS and IE are called "corresponding" subclasses, but they are not the only subclasses of IE. There are subclasses of IE that interpret to multiple nodes in the OT from a single IE instance. These are called "shortcut" subclasses and will be discussed further on the Parsing page.

The OT is formed only from the results of interpretation applied to the IT. Consequently, each node in the OT can (and will) know which node in the IT was interpreted to create it.

Subsequent to the creation (or replacement) of the OT during the Interpretation phase, any OSs impacted by that process will need to have their validation results recomputed. This is called the validation phase, and as shown in the above figure, follows the interpretation phase. The validation of any step of work in the OT can be performed at any time after it has been interpreted; there is no defined order of validation. This validation process will produce validation feedback (and possibly other types of feedback), which will be added to those OSs that have been processed and also transmitted to the client as feedback, using the same messaging protocol as interpretation feedback. The details of that protocol will be discussed in the Messages section of the Clients page.

In addition to feedback messages the LDE sends to the client, it can also send the entirety of the OT, so that the client can store it with the rest of the user's data (e.g., in the filesystem or cloud storage) in applications that support saving. Because the OT is the result of interpretation and contains the results of validation, it is a complete record of all processing the LDE has done to the client's input. When the same application reloads the user's work, it will contain the OT, which the client can provide to the LDE for the purpose of restoring the LDE's previous state. Thus extensive computations need not be redone and we also get the possibility of dependencies, as discussed below.

Dependencies

For the purposes of this section, let us assume that the client calls the user's input a "document." Clients are not bound by this restriction, but it is an acceptable placeholder term, particularly because the main, browser-based Lurch application will use that paradigm.

Any client may give the user the option to import one or more previously saved documents into the user's current document, with much the same semantics that programming languages permit importing previously defined modules or libraries. That is, the user's intent is for the meaning communicated in the earlier documents to be included at the start of the current document. We call such imported documents dependencies of the document that imports them.

Clients can support dependencies with minimal assistance from the LDE, as follows.

  1. To import into the current document $D$ a previously saved document $D'$, load $D'$ from storage and extract the OT embedded within it. Let's say the immediate children of the root of the OT are $t_1,\ldots,t_n$.
  2. Create an IE instance $e$ that serves merely as a wrapper for $t_1,\ldots,t_n$. That is, the interpret() routine of $e$ simply yields the list $t_1,\ldots,t_n$, fully processed and needing no further computation by the LDE.
  3. Prepend $e$ as a new child at the beginning of $D$. Mark it as having come from $D'$ so that if the user later changes his or her mind and removes the dependency, the proper child can be deleted.

Note that if there were yet a third document $D''$ on which $D'$ depended, then the nodes that $D''$ exported, which were therefore included in $D'$ when it was being worked on, $D'$ must also export, so that they are included in $D$ as well.

To improve the efficiency of importing dependencies, the LDE may choose to mark certain nodes in the OT as irrelevant for external documents. Then, still using the example of $D$ and $D'$ given above, not every child $t_1,\ldots,t_n$ of the root of $D'$ need be added to $e$; those marked irrelevant may be omitted. In addition, the interpret() routine of the wrapper node $e$ in item 2 above can choose to yield a proper subset of $t_1,\ldots,t_n$ that are relevant to the current document, ignoring some subset of those marked as potentially relevant in $D'$ itself.

If the client supports document-specific settings or preferences, they can appear as child nodes of the IT. The LDE can notice them and respect them when it does its work, and those settings will also automatically propagate into dependencies. Because dependencies are appended before the actual children of the document, the importing document's settings override those imported from the dependency, because they come later than those of the dependency.

Clone this wiki locally