Parsing the ECMAScript ForInOfStatement
I'm trying to write an as simple as possible ECMAScript recursive descent parser. It's not always that easy to do with ECMAScript.
Sometimes, you can't know what you are currently parsing. It's only when you hit an edge case, or are done parsing the current production and can read what comes after, that you can be certain what you just parsed.
Depending on what you parsed, you may have interpreted certain tokens the wrong way, so you need to reevaluate and adjust.
This makes writing an ECMAScript parser very interesting.
When writing this, my parser passes all the positive cases of the test262-parser-tests.
However, it also passes a few cases that should fail. Some of those have to do with for-loops. I'll be exploring the edge cases of the specification of for-loops in this text.
ForStatement And ForInOfStatement Productions
The ForStatement and ForInOfStatement productions appear at the same level, and they all start the same, so both must be considered when the for (
syntax is encountered.
Here is a little simplified definition of these two productions:
ForStatementfor ( [lookahead ≠ let [] Expressionopt ; Expressionopt ; Expressionopt ) Statement for ( var VariableDeclarationList ; Expressionopt ; Expressionopt ) Statement for ( LexicalDeclaration Expressionopt ; Expressionopt ) StatementForInOfStatement
for ( [lookahead ≠ let [] LeftHandSideExpression in Expression ) Statement for ( var ForBinding in Expression ) Statement for ( ForDeclaration in Expression ) Statement for ( [lookahead ∉ { let, async of }] LeftHandSideExpression of AssignmentExpression ) Statement for ( var ForBinding of AssignmentExpression ) Statement for ( ForDeclaration of AssignmentExpression ) Statement
When we've read the first two terminals, for
and (
, the syntax that follows can match any of the definitions above. We must figure out which one we are currently parsing.
The somewhat tricky part with the for-loop syntax is that there are multiple exceptions, caused by the lookaheads.
I usually want to understand what problem a lookahead solves. That makes it easier to reason about the parsing, structure the code, and add tests for the syntax the lookahead is dealing with.
The ForStatement Lookahead
The ForStatement have one lookahead:
for ( [lookahead ≠ let [] Expressionopt ; Expressionopt ; Expressionopt ) Statement
Note that this is a two token lookahead: let
followed by [
.
This lookahead exists to remove an ambiguity in how the syntax can be interpreted.
Consider this syntax:
// Does this destruct the array `b`?
// Or do we assign `b` to the key `a` of the object `let`?
for (let[a] = b;;) ;
The reason this ambiguity exists is because let
is not a reserved word in many contexts. (It can't be because of backwards compatibility requirements.)
The syntax above is valid. The lookahead just makes sure that the let[a] = b
cannot be parsed as an Expression.
It is instead matched by this row:
for ( LexicalDeclaration Expressionopt ; Expressionopt ) Statement
Which makes sure that it is always interpreted as a lexical declaration, i.e. it destructs the b
as an array and assign the first element to a
.
Example:
for (let[a] = [42];;) {
console.log(a); // 42
break;
}
The ForInOfStatement Lookaheads
This production have two lookaheads, one for the for-in loop and one for the for-of loop.
The for-in Lookahead
for ( [lookahead ≠ let [] LeftHandSideExpression in Expression ) Statement
This one is very similar to the lookahead in the ForStatement above:
// Does this destruct each key of `b`?
// Or do we assign each key of `b` to they key `a` of the object `let`?
for (let[a] in b) ;
This syntax is valid, just like in the previous example, the ForDeclaration
includes a lexical binding.
The lookahead causes it to unambiguously be interpreted as "destruct each key of b
into an array pattern".
Destructing keys into array patterns make sense, since you can have string keys. However, you can also destruct keys to an object pattern. That is valid syntax, but the result will always be undefined since no valid key can be destructed into an object pattern.
Example:
for (let [a, b] in {key1: 1}) {
console.log(a, b); // k e
}
The for-of Lookaheads
for ( [lookahead ∉ { let, async of }] LeftHandSideExpression of AssignmentExpression ) Statement
Here we have two cases: let
and async of
. Note that it is let
this time, not let [
as the previous cases.
The async of
Lookahead
The lookahead for async of
was added in 2020. This lookahead is not because the final syntax may be interpreted multiple ways. Instead it was added because there were ambiguities in which path a parser should take.
There are multiple cases like this in ECMAScript, some of them are solved with lookaheads, but some are too complex for lookahead and is solved with cover productions instead. The cover productions are intermediate productions that can later be converted into real productions.
The unambiguity happens in this case:
for (async of
// Can either be a for-of loop where `async` is an identifier:
for (async of []) ;
// Or a normal for loop with an arrow function as initializer:
for (async of => {};;);
Because of the lookahead in the for-of case, only the arrow function version is valid syntax.
The let
Lookahead
This lookahead solves the same ambiguity as the let [
lookahead in the for-in loop discussed above.
However, this lookahead covers more cases since it matches and denies everything starting with a let
token.
I've not been able to find a definite answer for why this is let
instead of let [
. My best guess is that it is the same reason why let
is not a valid identifier in strict-mode.
The for-of loop was introduced in ECMAScript 5, along with let
and const
. Unfortunately, let
is not a reserved word. (const
has always been reserved, which is why these problems does not exists for const.)
To still be able to use let
as a keyword, it is considered a keyword only in new syntax or contexts. In syntax and contexts that existed before it was introduced, it is still considered an identifier for backwards compatibility.
So in other words, let
is probably denied as identifier in the for-of loops, because for-of loops was introduced at the same time, so we can always treat the let
as a keyword without breaking anything.
This means that because of backwards compatibility, there are slight differences between what is allowed between for-in and for-of, even though they are almost identical:
// Valid syntax:
for (let.a in []) ;
// Invalid syntax:
for (let.a of []) ;
If you put a "use strict";
above that, both are invalid since let
is always treated as a keyword in that context. Which is how it really should be.
References
- https://262.ecma-international.org/14.0/#prod-ForInOfStatement
- https://262.ecma-international.org/14.0/#sec-lookahead-restrictions
- https://262.ecma-international.org/14.0/#sec-keywords-and-reserved-words
- https://github.com/tc39/ecma262/issues/2034
- https://github.com/tc39/ecma262/pull/2256
- https://github.com/FelixStridsberg/fajt