Attribution
The work is the result of a lot of input and inspiration from many SES Strategy members, with special gratitude to M. S. Miller, J. D. Dalton, M. Fig, R. Gibson, and, as well to T. Disney, who indirectly contributed, through his exceptional work where he pioneered intuitive ways to accurately reason about the fine-grained aspects of ECMAScript grammar.
This document addresses the underlying theory behind an experimental ECMAScript tokenizer that is designed from the ground up to meet the challenges of working with source text at runtime.
The main contribution of this work is that it aims to make it possible to minimize on expansive operations that conventionally relied on full AST generation to instead rely on conceptual abstractions, ie constructs and planes, designed to mimic a partial AST approach.
A lot of experimental work is also incorporated in this effort, demonstrated here and maintained here.
This section takes a constructional perspective of the ECMAScript grammar which, when compared to the hierarchical perspective of AST, considers a closure of prescribed delimiters and semantics to be a balanced and cohesive flat node representative of the infinite set of possible token nodes (things) that would be contained within it.
This notion of flat node or fat token is also referred to as a constructional plane and is signified by a single …
denoting its things.
ECMAScript grammar can be divided into two primary planes:
((…))
Expression stuff{{…}}
Statements stuff
Notation
Throughout this document we're using two parallel notations for both clarity and brevity.
For instance, the symbolic notations
((…))
and{{…}}
shown here, both meant to convey things (ie…
) belonging inside of a valid construction of the respective closures — where incidentally such things can wrap indefinitely in the respective delimiters.However, when those aspects are represented in abstract syntax forms, they will instead be denoted using a metaphorical notation that is also valid ECMAScript syntax for the intended effect.
While this is an extremely shallow view of the ECMAScript grammar, it only serve as the most fundamental building blocks of distinction to keep in mind as we move forward.
And absolutely, Expression
is what the spec calls ExpressionStatement
(mostly) and, yes, an Expression
is a thing of the Statements
stuff, yet as will be shown, it is special enough of a thing that you can neither resist nor should you want to treat it as anything less than stuff.
One important aspect to in mind is that such delimiters are sometimes forced or implied by the grammar, and in some cases, where they would be allowed, will be optionally introduced for style or effect — for instance where = …
normally does not need to be wrapped, one might opt for = (…);
which would wrap the entire expression aspect merely for stylistic reasons or = (…, …);
which is not stylistic and is for effect.
There is at least a few more stuff that we did not touch upon yet, and that is because, like everything else, they are considered secondary stuff, they are mostly things that happens around the primary stuff, some of which are also important to outline here, while others will fall in place as we move forward.
The list of "significant and magical planes" includes:
…{…}…
Module things⟨...⟩
Destructuring things
A working assumption here is that aside from the above everything else in the ECMAScript grammar will always be a thing that belongs to exactly one of those planes — except where Module
overlaps with Statements
as will be shown later on.
Let's explore all four planes in more detail to see if this actually holds up, and to identify where other complex planes like literals actually fall in place in the body of this work.
In an expression, you do Expression
things:
ObjectLiteral: ( {... χ} );
ArrayLiteral: ( [... χ] );
RegExpLiteral: ( /[{-}]/ );
ArrowFunctionExpression: ( ( χ ) => { {{ ; }} } );
( ( χ ) => (( χ )) );
AsyncArrowFunctionExpression: ( async ( χ ) => { {{ ; }} } );
( async ( χ ) => (( χ )) );
FunctionExpression: ( function ƒ ( ... χ ) { {{ ; }} } );
AsyncFunctionExpression: ( async function ƒ ( ... χ ) { {{ ; }} } );
GeneratorFunctionExpression: ( function* ƒ ( ... χ ) { {{ ; }} } );
AsyncGeneratorFunctionExpression: ( async function* ƒ ( ... χ ) { {{ ; }} } );
ClassExpression: ( class Χ { /***/ } );
( class Χ extends (( χ )) { /***/ } );
SpecialExpression: ( await (( χ )) );
( delete (( χ )) );
( import ( (( χ )) ) );
( (( χ )) instanceof (( χ )) );
( new (( χ )) );
( new (( χ )) ( ... χ ) );
( typeof (( χ )) );
( yield (( χ )) );
( yield* (( χ )) );
( void (( χ )) );
ReferenceExpression: ( /* binding ‹keyword› */ this [( χ )] );
( /* or ‹identifier› */ this . χ );
( /* like... */ this ( ... χ ) );
-
Every expression is metaphorically wrapped
(…);
to signify that it is anExpression
(ie((…))
) and that it is completely separate from others, hence the;
. -
There is only one place where you can leave the current
Expression
context and immediately enter into a nestedStatements
context, which per specs today is always some form of a Function Body{{ ; }}
other than Methods as those are always nested further down somewhere. -
The counterpart to this are places where you leave the current
Expression
context and immediately enter into another nestedExpression
of a respective LeftHandSideExpression denomination(( χ ))
. -
Another unique aspect of an
Expression
context is that it can have no declarations, and as such in places (not omitted above) where you would expect a Binding Identifierƒ
orΧ
, they will always be optional and may never take a Computed Property[( χ )]
form or any wrappedExpression
form. -
To further articulate on the above point, it would specifically exclude omitted forms of arrow functions having a single unwrapped argument, ie the
χ =>
form, which while not presenated are still like many undeniablyExpression
things per the spec, just not significantly relevant to the matter at hand. -
The remaining cases where you leave the current
Expression
context and enter into nested contexts of a clear intent include things like Literal Object{ ... χ }
, Literal Array[ ... χ ]
, Literal Pattern/[{-}]/
, Class Body{/***/}
, and Arguments( ... χ )
which specifically excludes omitted forms of arrow functions with a single unwrapped argument. -
The non-spec thing introduced here (ie
SpecialExpression
) is simply to presentExpression
context forms for the set of keywords that are applicable in that context:Note: Please consult the spec for any additional details relating to the specific set of keywords presented here not addressed in this summary.
-
In most cases, such keywords are of an operative, and they can in fact repeat indefinitely, like
yield yield χ
and so fourth. -
Keywords that will not work that way include
this
,import
,instanceof
, andnew
, but each for different reasons, and some of those are more of technical impracticality than absolutes. -
Also worth noting is that the contextually-sensitive keyword
super
which is omitted from this presentation and is closer in nature tothis
, ie are contextually bound identifiers relative to where they are used and nothing else. -
So in that regard, it is fair to also point out that there omitted forms along with meta-properties that are applicable to
new
andimport
, to be addressed.
-
In statements, you do Statements
things:
FunctionDeclaration: { function ƒ ( ... χ ) { {{ ; }} } };
AsyncFunctionDeclaration: { async function ƒ ( ... χ ) { {{ ; }} } };
GeneratorFunctionDeclaration: { function* ƒ ( ... χ ) { {{ ; }} } };
AsyncGeneratorFunctionDeclaration: { async function* ƒ ( ... χ ) { {{ ; }} } };
ClassDeclaration: { class Χ { /***/ } };
{ class Χ extends (( χ )) { /***/ } };
VariableDeclaration: { var χ = (( χ )) };
{ var [{ χ }] = (( χ )) };
BindingStatements: { with ( (( χ )) ) {{ ; }} };
ControlStatements: {
try { {{ ; }} }
catch (χ) { {{ ; }} }
finally { {{ ; }} }
if ( (( χ )) ) {{ ; }}
else if ( (( χ )) ) {{ ; }}
else {{ ; }}
for ( /***/ ) {{ ; }}
while ( (( χ )) ) {{ ; }}
do {{ ; }}
while ( (( χ )) )
switch ( (( χ )) ) { /***/ }
};
-
Every statement is metaphorically wrapped
{…};
to signify that it is aStatements
(ie{{…}}
) and that it is completely separate from others, hence the;
. -
A new
$$$
binding identifier is used to indicate potential effects beyond the scope of a block, where applicable by the with the spec (aka hoisting). TBD -
The
for
statement is odd because it includes very unique(/***/)
things which fall closer to beingStatements
thanExpression
things. -
While things are far less distorted in a
switch
block, it is far enough fromStatements
due to the special clauses forcase (( χ )):
anddefault:
which must precede anyStatements
stuff that is also not justStatements
. -
To further elaborate on the above point,
Statements
stuff inswitch
blocks along withfor
,do
, andwhile
, all introduce and/or affect the contextual significance of certain keywords likecontinue
, not only in their immediate scope, but further down into otherControlStatements
, where those keywords may or may not be expected normally, and are often also closely relate to Label$:…
things ofStatements
omitted here. -
The rules for
function
andclass
that is directly inStatements
are always declarations not expressions, so if they fall in anAssignmentExpression
position, we can think of them as being implicitly(( χ ))
wrapped from a constructional standpoint, and this way they remain strictly speakingExpression
things in comparison. -
When you use operators like
=
in statements, don't forget, everything that follows is also a metaphorically wrapped(( χ ))
. -
In fact, when you write an unwrapped expression thing (per the previous section), don't think of it as
Statements
because it is a metaphorically wrappedExpression
and that will always be identical to the same physically wrapped(( χ ))
. -
Last thing to note, from the perspective of this work, is that any form of SourceText that is not a
Module
is considered to beStatements
.
In a module, you do Module
things:
ImportDeclaration: import 'specifier';
import χ from 'specifier';
import χ, { /***/ } from 'specifier';
import { /***/ } from 'specifier';
ExportDeclaration: export { /***/ };
export default (( χ )) ;
export var χ = (( χ )) ;
export var [{ χ }] = (( χ )) ;
export class Χ { /***/ };
export class Χ extends (( χ )) { /***/ };
export function ƒ ( ... χ ) { {{ ; }} };
export function ƒ ( ... χ ) { {{ ; }} };
export async function ƒ ( ... χ ) { {{ ; }} };
export function* ƒ ( ... χ ) { {{ ; }} };
export async function* ƒ ( ... χ ) { {{ ; }} };
export { /***/ } from 'specifier';
export * as χ from 'specifier';
-
Module
stuff being that, it stands out because it seems to have all the things ofStatements
along withImports
andExports
. -
The same rules for
function
andclass
inStatements
also apply toModule
(ie top-level code), where they will always be declarations, exported or otherwise. -
Additionally important to note here is that what follows an
export default
is alsoExpression
and neverModule
and so here too anyfunction
orclass
forms are strictlyExpression
things. -
A given fact mentioned for completeness, is that any
{{ ; }}
in the currentModule
context begins aStatements
context, and that's not reciprocative in that you cannot per the spec today have nestedModule
contexts, they are either the top-level or otherwise lexically irrelevant.
In destructuring, you do Destructruing
things:
// For now just consult the spec!
- Compared to all things we've seen so far
Destructuring
stands out because it is actually both aStatements
andExpression
thing, where in both cases they are meant to make deeply nested references that will initialize or simply assign against binding identifiers available in scope.
This section explores various contextually relevance of constructs of the grammar in an effort to formalize the necessary rules to effectively define effective constructs (work in progress).
- Import Constructs
- Export Constructs
- Declaration Constructs
- In/Direct Eval Constructs
- Assignments of Eval Constructs
To be continued.
<style src="/markout/styles/markup.debug.css"></style>