Iterables, Iterators and Generators

Every JavaScript developer is working with arrays. But sometimes this data structure is not the right choice. If you want to work with extremely large amounts of data or infinite sequences, arrays are out of the question and you need alternatives. A quite new option, which has been little used so far, are iterables and iterators, which have been developed exactly for such use cases and are already standard in other languages (e.g. Python or C#).

2020-03-23

Preface

In my work at esveo, we use the progamming language JavaScript or TypeScript in many projects. Due to the wide range of possible applications and the extreme strength of the ecosystem of these languages, this has never been a bad decision. Compared to development with other languages and environments, the complexity of applications has only increased in recent years (as explained in another article) to such an extent that people are looking for support for frameworks and design patterns that have long been common in other communities. One of these patterns is the iterator, or itables: a unified pattern that can be used to map a stream of data blocks.

Even though the code examples in this article are written in TypeScript, I would like to emphasize that itables are a JavaScript feature, which means that they can also be used in TypeScript.

Why no arrays?

const data = [1, 2, 3]; // Simple, convenient and easy to read.

data.push(4);

If you have to store a large number of values in a variable, you usually use an array. So why move away from the established procedure? The big disadvantage with arrays is that all values exist in memory at the same time. If, for example, a file has to be read and processed during server-side programming, the lines of the file should not be loaded and stored in an array and then processed one after the other, as this would only fill the server's memory unnecessarily. Instead, process each line individually and thus avoid a high peak in resource consumption. Of course you can also use the Node.js typical streams, but with itables, a language feature exists since 2015 that helps us with such use cases.

Iterables & Iterator

The basic building block for using iterables is the iterator: An object on which you can call the next() method to get the next value. The result is an object with the keys value and done, which contain the current value and information about the state of the iterator.

In TypeScript the interfaces of the Iterator and the IteratorResult look a bit like this (for the explanation 2 methods were omitted, which are not needed here, the correct types can be found in lib.es2015.iterable.d.ts):

interface IteratorResult<ValueType> {
  done: boolean;
  value: ValueType;
}

interface Iterator<ValueType> {
  next: (value?: any) => IteratorResult<ValueType>;
}

A sample implementation that generates a series of ascending numbers could look like this:

let n = 0;
const numberIterator = {
  next: () => {
    const nextResult = { value: n, done: false };
    n++;
    return nextResult;
  },
};

numberIterator.next(); // { value: 0, done: false }
numberIterator.next(); // { value: 1, done: false }
numberIterator.next(); // { value: 2, done: false }

For a simple sequence of numbers, this is an astonishing amount of effort. Moreover, a single stateful iterator offers little added value. Instead, you would rather like to mark your own classes as "iterable". This is done by using the Iterable interface. By definition, an object is an Iterable if it contains a method in the 'Symbol.iterator' key that returns an Iterator as a result:

interface Iterable<ValueType> {
  [Symbol.iterator](): Iterator<ValueType>;
}

An implementation of the number sequence could look like this:

const numbers = {
  [Symbol.iterator]: () => {
    let n = 0;
    return {
      next: () => {
        const nextResult = { value: n, done: false };
        n++;
        return nextResult;
      },
    };
  },
};

So the writing effort is even greater to implement something simple like a sequence of numbers.

However, a special feature of these solutions should be emphasized here: The sequence of numbers is infinite. This could not be represented in an array, because it must always have a finite length, since all elements of the array must be held in memory at the same time.

Like this article? Follow me on Twitter so you don't miss new content.

Since the iterable is a uniformly defined interface, language features can be built upon it:

// Implementation of the iterable omitted
const iterableWithNumbersFromOneToThree = {
  /* ... */
};
for (const number of iterableWithNumbersFromOneToThree) {
  console.log(number); // 1, 2, 3
}

const arrayFromIterable = [...iterableWithNumbersFromOneToThree]; // [1, 2, 3]

// Same as console.log(1, 2, 3)
console.log(...iterableWithNumbersFromOneToThree);

The language constructs for ... of and the spread operator ... both consume iterables. Conveniently, the built-in arrays are also iterable. This means that this code is flexible from line 5 on: it can be executed with arrays, sets, NodeLists and other iterables.

Since the use of iterables is to be spread, a new kind of function has also been added to JavaScript that makes it much easier to create iterables: The Generator. A generator is a function that returns an iterable iterator. So an iterator (with a next method) which is also an iterable (with the Symbol.iterator method). The notation looks like this:

function* numberGenerator() {
  let n = 0;
  while (true) {
    yield n;
    n++;
  }
}

Two particularities must be taken into account here: The * before the function name indicates that this function is a generator. The yield n in the function block generates the next value in the iterable. After the yield statement, the function code pauses in the function and does not continue running until the .next() method is called from outside. For this reason the while (true) does not freeze the browser.

So the code produces the same infinite sequence of numbers as above, but is much easier to read.

Iterable helper functions

Anyone who has worked with lists of data in JavaScript before will know array methods such as 'map' or 'filter'. These must currently be implemented by the developer for the iterables (if you don't want to use an external library). There already is a proposal in the TC39 process which will help. Until then a simple implementation is needed:

// Transform any value in iterable with a given function:
function* map(iterable, transform) {
  for (let item of iterable) {
    yield transform(item);
  }
}

// Check the return value of a function for each value in the iterator.
// Keep only values where the function returns true.
function* filter(iterable, condition) {
  for (let item of iterable) {
    if (condition(item)) yield item;
  }
}

As a small example, a simple transformation will now be implemented:

function transform(numbers: Iterable<number>) {
  const squared = map(numbers, (n) => n * n);
  const even = filter(numbers, (n) => n % 2 === 0);
  const multiplied = map(numbers, (n) => n * 100);

  return multiplied;
}

This function squares all numbers in an iterable, removes all odd numbers and multiplies the remaining numbers by 100.

Once again, I want to emphasize: Iterables are "lazy"! That means, only those values are calculated, which are really needed:

const infinite = numberGenerator();
const transformed = transform(infinite);

// No calculation has been performed yet.

let i = 0;
for (const n of transformed) {
  if (i > 10) break;
  i++;
  console.log(s);
}
// Here the first 10 results of the iterator
// are "pulled" and written to the console.

Both the filters and the transformations are only performed as often as required for the first 10 results. This is why iterables are called "pull-based". With the array methods this procedure would not be possible. We would first have to think about how many numbers we need initially to have exactly 10 results after our pipeline. In addition, each processing step with .map() or .filter() would create a completely new array, which would put additional load on the garbage collector.

What is all this for?

The biggest advantage of using the iterable interface is hidden in the SOLID principles, more precisely in the Dependency inversion principle. This says that code should always depend on interfaces and not on concrete implementations. If the `transform' function would work directly with array methods, the function could only be called with arrays and we would be unnecessarily restricted in its use.

Here comes the biggest advantage of using itables: The built-in language and browser functions already use itables: Arrays, sets, maps, NodeLists and many more already implement the iterable interface. This means that our `transform' function can be called with these data structures and still work. So the functions remain flexible and can be used for many use cases.

Especially in backend development it is sometimes necessary to implement your own data structures like a queue or a linked list to ensure optimal performance. If these are also itables, many helper functions can still be used.

So the next time you implement a function that receives a list of elements, I hope you think of itables first to process this list.

We Went to DevDay!