Datastructures I: List

This article starts a series dedicated to the datastructures, seen through JavaScript "optics". I'm unhappy with the current state of things where such fundamental topics are demonstrated by clunky Java-like code. With even the better ones being OOP-heavy and looking repulsive.

My primary goal is to investigate, analyze, and compare the alternatives. So, I hope, this series will still be more about Computer Science then JavaScript.

I will avoid performance nitpicking and concentrate on structural matters instead. For example, I will use recursions instead of loops, despite the fact tail calls are still not optimized in the current V8. Recursion is more powerful and expressive tool, which satisfies the setting of this article series.

In the initial parts, we'll implement eager and lazy lists. Then, compare the latter with generators. And then we're going be ready to review streams. At least, that's how the current plan looks like.

Good. What's (Linked) List?! Check Wikipedia. The key observation is that it's a fundamental datastructure which substitutes arrays in many languages. Substitutes, in the sense of reserving the canonical [] literal to them. The first benefis of List is that it's perfect at pattern matching (due to its type). This is not applicable to JS but worth mentioning.

Another benefit of lists is how easy they are. Including how easy is to implement laziness on top of them. By "lazy" here I mean "lazy-per-item". Such lazy lists may be seen as synchronous pull streams and come in handy when you need to represent big or infinite sequences: natural numbers, fibonacci, etc.

You can definitely use generators but many other languages don't have them and rely on lazy datastructures. So it's interesting to try this approach in JavaScript as well.

The drawbacks of lists are: O(n) append and O(n) search / indexing. See the comparison table. But, again, this is not an article about linked lists. This is an article about their implementation in JS.

So, without a further ado, here is a basic linked list:

let A3 = null
let A2 = {value: "a2", next: A3}
let A1 = {value: "a1", next: A2}
let A0 = {value: "a0", next: A1}

console.log(A0)                // {value: "a0", next: [Object]}
console.log(A0.next)           // {value: "a1", next: [Object]}
console.log(A0.next.next)      // {value: "a2", next: [Object]}
console.log(A0.next.next.next) // null

Or, from a different angle:

let A0 = {
  value: "a0",
  next: {
    value: "a1",
    next: {
      value: "a2",
      next: null
    }
  }
}

– Isn't the level of nesting limited in some way? – may you ask. Yes and No. Parsers are often limited with something like "up to 512". But you aren't going to define infinite and even just long sequences in literal form. You will make functions to generate them and then those "nesting levels" become unlimited references, which is clear from the first example.

As for JSON, it's impossible to represent an infinite stream by a finite file. But it's possible to represent every stream element as a single JSON object. And rely on time, not space, for streaming. We'll come back to this question later.

The basic function for linked lists is traditionally named Cons (from "constructor"). You'll find this name in many sources. Why not List? Well, I can't speak for McCarthy, who probably coined the term, but Cons describes only one item and list is derivative. Seems justified to me.

function Cons(value, next) {
  return {value, next}
}

Making things a bit nicer:

let A3 = null
let A2 = Cons("a2", A3)
let A1 = Cons("a1", A2)
let A0 = Cons("a0", A1)

We will use null to mark an empty cons cell (the original name) for brevity. In statically typed languages it's usually a special singleton type (called Nil, or something). Type expressability in basic JS is limited as well as the idea of it for an average JS developer. Let's omit some details.

A datastructure is defined by the its behaviors. Our next task is to implement the basic functions over lists. Yet to verify their outputs, it's good to have a couple of extra utils. console.log won't help us much. Let's implement toArray:

let toArray = (xs) => {
  if (xs) {
    return [xs.value].concat(toArray(xs.next))
  } else {
    return []
  }
}

console.log(toArray(A0)) // ["a0", "a1", "a2"]

and an auxiliary function log:

let logList = (list) => console.log(toArray(list))

logList(A0) // ["a0", "a1", "a2"]

We already have prepend under the alias of Cons. What about append? As we saw earlier, it's not a desirable operation for a list. For infinite lists it will be impossible to append at all.

Anyway, here is append:

let append = (x, xs) => {
  if (xs) {
    return Cons(xs.value, append(x, xs.next))
  } else {
    return Cons(x, null)
  }
}

logList(append("a3", A0)) // ["a0", "a1", "a2", "a3"]

We reassemble the list with Cons(xs.value, append(x, xs.next)) and then create a new trailing item with Cons(x, null).

Now concat:

let concat = (xs, ys) => {
  if (xs) {
    return Cons(xs.value, concat(xs.next, ys))
  } else {
    return ys
  }
}

logList(concat(A0, A0)) // ["a0", "a1", "a2", "a0", "a1", "a2"]

And, finally, map:

let map = (fn, xs) => {
  if (xs) {
    return Cons(fn(xs.value), map(fn, xs.next))
  } else {
    return null
  }
}

logList(map(x => x + "!", A0)) // ["a0!", "a1!", "a2!"]

In the next episode we'll implement the lazy versions of the above.

P.S.

I confess I've cheated a bit saying first about proper recursions and then implementing map and other functions in non-tail style.

Realistic (eager) map would look more convoluted:

let map2 = (fn, xs) => {
  return (function go(acc, xs) {
    if (xs) {
      return go(Cons(fn(xs.value), xs.next))
    } else {
      return acc
    }
  })(null, xs)
}

My excuse is that not many JS programmers even heard of tail recursion. Because of that, and also because it's impossible to test those things right now, I omit the extra complexity. Let it be the subject of another talk.