Building state generators for yield and await
What follows is the documentation for my base state generator class that is used for building the state machines for yield and await operators for WootzJs.
The base class for C# state generators is used for async and yield functionality. The following is essentially my take on concepts that can be gleaned from a study of coroutines. The essential problem is that in normal control flow, a method has a beginning and an end -- at the end, flow jumps back to the caller immediately after the invocation of the method that has just been called. This is known as a function's epilogue. However, await and yield require control flow that is more flexible. It's easiest to understand by starting with the most flexible operator, await.
With await, you want to ask the awaiter to do its thing and when it's done (perhaps hours later) logically return the control flow to immediately after the await expression (the next argument in a function call, the next statement, etc.). In other words, to explicitly stimulate the function's epilogue on its own terms instead of implicitly when a method is simply finished.
In the meantime, the callstack is successively unwound; if the calling method is also async and awaiting, this bubbles up the callstack until a non-async method is reached. (Ultimately, the top of a callstack cannot be async, be it a Main method or a Run on another thread.) This allows the original thread to continue on its merry way and perform other actions. This is in stark contrast to a normal method invocation. Without async (or yield), the method must finish before the callstack unwinds back to the caller.
The way this is implemented is that when you start the awaiter, you provide the awaiter with a callback function. What's extremely special about this callback function is that its effect is essentially the function's epilogue -- it moves control flow back to an arbitrary point of a method. Needless to say, except for await and yield, this is not otherwise possible in C#.
This ability to create a callback that can return to an arbitrary point in the method is the key ingredient to both await and yield. When implementing a yield iterator, you are asking the caller to do some work, then the iterator to do some work, and then the caller, and then the iterator, and so on. Importantly each time work is handed off between the two parties, control flow must resume from where it left off last time. This is achieved on the caller side by normal imperative control flow. The caller does work and when it's ready to wait for a new element, it calls MoveNext() (like any other implementation of IEnumerator. Notably, the C# compiler's async method state machine uses the name MoveNext() as well). However, the implementation of an iterator's MoveNext() method is the callback ingredient we have been describing. This callback returns control flow to arbitary points in the iterator (wherever the yield operator was last used).
So, to implement await and yield, we must figure out a way to create this callback that can return to arbitrary points in a method. The following applies to both C# and Javascript. (In fact, the initial implementation was a C# -> C# translation, but is now a C# -> JS translation.) To create this callback we must rip apart the abstract syntax tree of the method doing the awaiting or yielding. The way this is usually done is by creating a state machine. Concretely, this means generating a switch statement that switches on an integer state variable. Each case statement represents some portion of logic (one or more statements, and sometimes even individual expressions). Modifying the value of this state variable effectively changes where you are in the original method. Thus it becomes possible to return from the function and, the next time it is invoked, logically continue execution from where you left off before.
Generally speaking, this is straightforward. However, control flow constructs (if, while, for, foreach, try/catch, etc.), must be implemented in a new way in order to be able to exit from the function and somehow resume back into the middle of a control flow. This can probably best be explained by example. We'll use yield in these examples because conceptually it's a bit more specific (caller and iterator) and thus easier to reason about. But all of this applies to await as well.
We'll start with the simplest possible yield iterator:
public IEnumerator<string> Empty() { yield break; }
This iterator will produce an enumerator that will return false on the first invocation of MoveNext() (a sequence with no elements). Let's convert it into a state machine. We'll use C# for consistency, but the generated output is obviously Javascript. Also, the implementation is simplified for clarity, but the real output is more sophisticated (implements IEnumerable, IDisposable, etc. Additionally, generated members such as state are actually translated prefixed with "$" to avoid collisions with user code).
class YieldBreakIterator : IEnumerator<string> { private int state; public string Current { get; private set; } public bool MoveNext() { switch (state) { case 0: return false; } } }
Here we have a simple state machine. Now let's change the iterator to:
public IEnumerator<string> ReturnFoo() { yield return "foo"; }
This iterator will be much the same as the first one, but it will first set the value of Current to "foo" before returning true.
class YieldReturnFooIterator : IEnumerator<string> { private int state; public string Current { get; private set; } public bool MoveNext() { switch (state) { case 0: state = 1; Current = "foo"; return true; case 1: return false; } } }
Here we can see that after the first invocation of MoveNext(), Current will contain the value "foo". The next invocation will arrive in state 2 and return false, ending the iteration.
The penultimate example is two yield statements in a row. This finally illustrates jumping into an arbitrary point in the iterator.
public IEnumerator<int> ReturnTenAnd42() { yield return 10; yield return 42; }
Translating to the state machine:
class YieldReturnTenAnd42Iterator : IEnumerator<int> { private int state; public int Current { get; private set; } public bool MoveNext() { switch (state) { case 0: state = 1; Current = 10; return true; case 1: state = 2; Current = 42; return true; case 2: return false; } } }
Now we get to see how the two statements in the iterator are broken down into two case statements. Each case statement is responsible for an individual yield return statement. Each invocation of MoveNext() moves you along, one statement at a time.
Finally, we will consider just one complex example -- the use of a for loop. The key difference here is that control flow now has to jump around between case statements. The for loop is just one example. All the control flow statement operations in C# require similar transformations.
public IEnumerator<int> ForLoop() { for (var i = 0; i < 2; i++) { yield return i; } }
And now, let's look at the state machine:
class ForLoop : IEnumerator<int> { private int state; private int i; public int Current { get; private set; } public bool MoveNext() { switch (state) { case 0: i = 0; goto 1; case 1: while (i < 2) { goto 2; } return false; case 2: state = 1; Current = i; i++; return true; } } }
Here we jump around between different case statements, and various expressions from the original iterator are decomposed into separate statements in the state machine.
The for loop initializer is pulled out and is expressed in the first line of case 0.
The for loop itself is converted into a while loop, using the same condition.
The body of the loop jumps to state 2 (not strictly necessary in this example to have a separate state, but in more complex examples this transformation becomes necessary).
State 2 is responsible for updating the Current property to the new value, apply the incrementor of the for loop, change the state to 1 (the top of the loop) and finally return true indicating that the next invocation should attempt to continue the loop.
That's basically it. All the transformations for the various control flow constructs are the same between await and yield. The only differences are in the handling of await expressions and yield statements. Since they are mutually exclusive within a method, this simplifies the implementation.