Parsing the ECMAScript ForInOfStatement

I'm trying to write an as simple as possible ECMAScript recursive descent parser. It's not always that easy to do with ECMAScript.

Sometimes, you can't know what you are currently parsing. It's only when you hit an edge case, or are done parsing the current production and can read what comes after, that you can be certain what you just parsed.

Depending on what you parsed, you may have interpreted certain tokens the wrong way, so you need to reevaluate and adjust.

This makes writing an ECMAScript parser very interesting.

When writing this, my parser passes all the positive cases of the test262-parser-tests.

However, it also passes a few cases that should fail. Some of those have to do with for-loops. I'll be exploring the edge cases of the specification of for-loops in this text.

ForStatement And ForInOfStatement Productions

The ForStatement and ForInOfStatement productions appear at the same level, and they all start the same, so both must be considered when the for ( syntax is encountered.

Here is a little simplified definition of these two productions:

ForStatement

for ( [lookahead ≠ let [] Expressionopt ; Expressionopt ; Expressionopt ) Statement
for ( var VariableDeclarationList ; Expressionopt ; Expressionopt ) Statement
for ( LexicalDeclaration Expressionopt ; Expressionopt ) Statement

ForInOfStatement

for ( [lookahead ≠ let [] LeftHandSideExpression in Expression ) Statement
for ( var ForBinding in Expression ) Statement
for ( ForDeclaration in Expression ) Statement
for ( [lookahead ∉ { let, async of }] LeftHandSideExpression of AssignmentExpression ) Statement
for ( var ForBinding of AssignmentExpression ) Statement
for ( ForDeclaration of AssignmentExpression ) Statement

When we've read the first two terminals, for and (, the syntax that follows can match any of the definitions above. We must figure out which one we are currently parsing.

The somewhat tricky part with the for-loop syntax is that there are multiple exceptions, caused by the lookaheads.

I usually want to understand what problem a lookahead solves. That makes it easier to reason about the parsing, structure the code, and add tests for the syntax the lookahead is dealing with.

The ForStatement Lookahead

The ForStatement have one lookahead:

for ( [lookahead ≠ let [] Expressionopt ; Expressionopt ; Expressionopt ) Statement

Note that this is a two token lookahead: let followed by [.

This lookahead exists to remove an ambiguity in how the syntax can be interpreted.

Consider this syntax:

// Does this destruct the array `b`?
// Or do we assign `b` to the key `a` of the object `let`?
for (let[a] = b;;) ;

The reason this ambiguity exists is because let is not a reserved word in many contexts. (It can't be because of backwards compatibility requirements.)

The syntax above is valid. The lookahead just makes sure that the let[a] = b cannot be parsed as an Expression.

It is instead matched by this row:

for ( LexicalDeclaration Expressionopt ; Expressionopt ) Statement

Which makes sure that it is always interpreted as a lexical declaration, i.e. it destructs the b as an array and assign the first element to a.

Example:

for (let[a] = [42];;) {
  console.log(a); // 42
  break;
}

The ForInOfStatement Lookaheads

This production have two lookaheads, one for the for-in loop and one for the for-of loop.

The for-in Lookahead

for ( [lookahead ≠ let [] LeftHandSideExpression in Expression ) Statement

This one is very similar to the lookahead in the ForStatement above:

// Does this destruct each key of `b`?
// Or do we assign each key of `b` to they key `a` of the object `let`?
for (let[a] in b) ;

This syntax is valid, just like in the previous example, the ForDeclaration includes a lexical binding.

The lookahead causes it to unambiguously be interpreted as "destruct each key of b into an array pattern".

Destructing keys into array patterns make sense, since you can have string keys. However, you can also destruct keys to an object pattern. That is valid syntax, but the result will always be undefined since no valid key can be destructed into an object pattern.

Example:

for (let [a, b] in {key1: 1}) {
  console.log(a, b); // k e
}

The for-of Lookaheads

for ( [lookahead ∉ { let, async of }] LeftHandSideExpression of AssignmentExpression ) Statement

Here we have two cases: let and async of. Note that it is let this time, not let [ as the previous cases.

The `async of` Lookahead

The lookahead for async of was added in 2020. This lookahead is not because the final syntax may be interpreted multiple ways. Instead it was added because there were ambiguities in which path a parser should take.

There are multiple cases like this in ECMAScript, some of them are solved with lookaheads, but some are too complex for lookahead and is solved with cover productions instead. The cover productions are intermediate productions that can later be converted into real productions.

The unambiguity happens in this case:

for (async of

// Can either be a for-of loop where `async` is an identifier:
for (async of []) ;

// Or a normal for loop with an arrow function as initializer:
for (async of => {};;);

Because of the lookahead in the for-of case, only the arrow function version is valid syntax.

The `let` Lookahead

This lookahead solves the same ambiguity as the let [ lookahead in the for-in loop discussed above.

However, this lookahead covers more cases since it matches and denies everything starting with a let token.

I've not been able to find a definite answer for why this is let instead of let [. My best guess is that it is the same reason why let is not a valid identifier in strict-mode.

The for-of loop was introduced in ECMAScript 5, along with let and const. Unfortunately, let is not a reserved word. (const has always been reserved, which is why these problems does not exists for const.)

To still be able to use let as a keyword, it is considered a keyword only in new syntax or contexts. In syntax and contexts that existed before it was introduced, it is still considered an identifier for backwards compatibility.

So in other words, let is probably denied as identifier in the for-of loops, because for-of loops was introduced at the same time, so we can always treat the let as a keyword without breaking anything.

This means that because of backwards compatibility, there are slight differences between what is allowed between for-in and for-of, even though they are almost identical:

// Valid syntax:
for (let.a in []) ;


// Invalid syntax:
for (let.a of []) ;

If you put a "use strict"; above that, both are invalid since let is always treated as a keyword in that context. Which is how it really should be.