The yield statement

C# 2.0 introduced a new keyword, yield. I didn't paid many attention to this new keyword, assuming that anonimous delegates and generics were more interesting and that yield was only a way to wrap an "iterator pattern". I was wrong (but I have the excuse that the name given to this feature in C# is 'enumerators', and I think now that this name is a bit reductive and misleading). Yield exposes a feature known by the Python community as Generators.

They are a bit like continuations, because they return (or better, yield) a value and then when they are called again they resume execution right after the last yield.

In C#, yield must be used inside an iterator block. An iterator block is more or less a function whore return type must be IEnumerable, IEnumerator, IEnumerable, or IEnumerator (see MSDN).

using System;
using System.Collections.Generic;
using System.Text;

namespace YieldTest
{
class Program
{
static void Main(string[] args)
{
foreach(string s in gen())
Console.WriteLine(s);

Console.ReadLine();
}

static IEnumerable<string> gen()
{
Console.WriteLine("In the generator function");

Console.WriteLine("Give A");
yield return "A";

Console.WriteLine("Give B");
yield return "B";
}
}
}

It is easy to see why C# enumerators are very useful when implementing iteration over a sequence: as Raymond Chen points out in one of his blog posts there are two "models" for the enumerator-consumer pattern:


"The COM model for enumeration (enumeration objects) is biased towards making life easy for the consumer and hard for the producer. The enumeration object (producer) needs to be structured as a state machine, which can be quite onerous for complicated enumerators, for example, tree walking or composite enumeration."

"On the other hand, the callback model for producer (used by most Win32 functions) is biased towards making life easy for the enumerator and hard for the consumer. This time, it is the consumer that needs to be structured as a state machine, which is more work if the consumer is doing something complicated with each callback. (And even if not, you have to create a context structure to pass state from the caller, through the enumerator, to the callback.) "

In the first model, the caller calls Next() repeatedly and the enumerator has to find the next item and return it. Since the enumerator returns, it has to record state informations with a stack data structure, mimicking the call stack.

In the callback model, on the other and, the producer performs the operations it needs on the data structure (walks a tree, for example) and calls back the consumer through the callback as it finds items. This makes the producer implementation straightforward (in the case of the tree, a simple recursive function will do) but life is made harder for the consumer: it needs to maintain state across each callback.

It would be great to have the simpler approach on both sides, with the caller seeing a simple enumerator that returns items in order and the enumerator seeing a callback that it can throw item into. Raymond solution is a great piece of software based on fibers (user-scheduled threads), but as he points out fibers are hard to use, and it is very easy to make subtle error difficult to debug.

C# solution is in the yield statement and in generators.


Copyright 2020 - Lorenzo Dematte