## Friday, August 12, 2011

### Holey Data, Part 1/3: (Almost) Difference Lists

This series has been brewing in my head since a year ago, starting at the 2010 Workshop on Generic Programming in Baltimore. It was at least a quarter-baked idea by the time I asked, on the CSTheory Stackexchange, a question that amounted to "are there functional programming difference lists that act like logic programming difference lists"? [1]

## Difference lists

Difference lists, which are deep lore of the logic programming community, are essentially data structures that allow you to append to the beginning and the end of the list as a constant time operation; they can also be turned back into "regular" lists as a constant time operation. You can see [2,3,4] for details, but I'd almost encourage you not to: logic programming difference lists are a bit crazy.

Let me give you two ways of thinking about difference lists. If you're a lazy person (in the Haskell sense), you may say that difference lists really have something to do with laziness in functional programming languages. William Lovas showed me some crazy Haskell code yesterday that does something of this flavor, and there's a paper that espouses this viewpoint [5]. I'll stick with a different viewpoint: difference lists are functions from lists to lists. This is how difference lists are implemented in the Haskell Data.DList library [6], and you can find people that back-ported the idea that difference lists are "really" functions to higher-order logic programming languages [7].

The problem with the idea of difference lists being functions is that you lose one of the fundamental properties that you started with: while it depends on how you implement things, difference lists implemented as functions will usually have a O(1) append operation but a O(n) "turn me into a regular list" operation. For a discussions of why, see the answers to my question on CSTheory Stackexchange.

I've believed for some time that this unfortunate O(n)ness could be overcome, but I wanted a working proof-of concept implementation before I talked about it. It was during my recent visit to France that I realized that I really needed to base the proof-of-concept off of a language that was appropriately persnickety about the difference between what I called "value code" and what I called "computation code" in the previous post. And then I found Levy, and embraced and extended it into Levy#.1 So at this point you should read the previous post.

## Representing datatypes

In the last post I talked about declaring and using datatypes in Levy#, using the example of lists of integers:
   data Nil: list      | Cons: int -o list -o list
It's necessary to say a few words about how these things are implemented. Every value in Levy# is represented in the OCaml interpreter as a heapdata ref - that is, a mutable pointer into a "box" in memory. Some boxes hold integers, other boxes hold the code and environments that represent suspended (thunked) computations. Declared constants from datatype declarations are another kind of box which holds both a tag (like Nil or Cons) and an appropriate number of values (that is, pointers) for the given datatype.

As an example, here's the way the list Cons 3 (Cons 7 Nil), stored in the variable xs, could be laid out in memory:

## Representing difference lists

### Types

On the level of types, a difference list will be represented as a member of the value type (the Linear LF type) list -o list. This adequately represents what a difference list is better than the list -> list type of Haskell or Lambda Prolog/Twelf. In Haskell, lots of functions have type list -> list, including the list reversal function. Lambda Prolog/Twelf does better; the LF type list -> list really has to be substitution function - it can only take a list and plug it into designated places in another list. But λx. Nil is a perfectly good LF value of type list -> list. In other words, it's not necessarily the case that applying something to a function is equivalent to tacking that thing on to the end of the difference list - the applied argument can simply be ignored.

Linear functions must use their argument exactly once, so linear functions list -o list accurately represent the idea of difference lists, a list with exactly one missing list somewhere in it. For lists this is kind of a dumb way to put it: the missing list has to go at the end of a series of Conses or nowhere at all, so the linearity forces there to be one (as opposed to zero) occurrences of the hole. If we were talking about terms of type tree -o tree, then linearity would enforce that there was one (as opposed to zero or two or three...) occurrences of the hole. We'll come back to that later...

### Terms

Recall from the previous post that Levy# starts off syntactically precluded from forming values of this type list -o list; we'll have to fix that by adding new syntax. On the level of syntax, a difference list will be represented using a Twelf-like notation where "λ x:τ.v" is written as [x:tau] v.

Example The difference list containing first 9 and then 4 will be represented as [hole: list] Cons 9 (Cons 4 hole); let's call that difference list dxs. Let dys be the some other piece of value code with type list -o list, and let xs be the example difference list Cons 3 (Cons 7 Nil) from the previous section.

Then dxs xs appends the difference list and regular list by filling in the hole with xs; the result is Cons 9 (Cons 4 (Cons 3 (Cons 7 Nil))). Similarly, [hole: list] dxs (dys hole) is a piece of value code that forms a new difference list by appending the two smaller difference lists.

In terms of the expressive power of these difference lists, that's pretty much the story: the file linear.levy has some more examples, which can be run in the "holey-blog-1" tagged point in the repository.

A Linear Logic Aside. This, perhaps, clears up the reason I used linear implication to define constructors like Cons. Say instead of Cons we had ConsBad, a Linear LF constant of type int -> list -> list. That type is equivalent in Linear LF to a term of type !int -o !list -o list. The exclamation points ("bangs") prevent the occurrence of a linear variable in either argument to ConsBad, so [hole: list] ConsBad 9 (ConsBad 4 hole) would NOT be a (piece of value code)/(Linear LF term) of type list -o list.

### Memory representation

So far, I've explained how difference lists can be represented as linear functions, but what I originally promised was difference lists with O(1) append and turn-into-a-regular-list operations. To do this, I need to explain how linear functions can be represented in memory. A linear function is represented as another kind of box in memory, one with two pointers. The first pointer acts like a regular value - it points to the beginning of a data structure. The second pointer acts like a pointer into a data structure: it locates the "hole" - the location of the linear variable. This is the in-memory representation of our example [hole: list] Cons 9 (Cons 4 hole) from above, stored as the variable dxs.

If we want to apply this linear function to a value and thereby insert a list into the hole, we merely have to follow the second hole-pointer and write to it, and then return the first value-pointer as the result. VoilĂ , application with only local pointer manipulation! If the result of applying xs to dxs was stored as the variable ys, the state of memory would look something like this:

The catch is that the linear function box that dxs refers to no longer can be used in the same way: if we tried to write something different to the hole pointer, disaster could result: it would change the value of ys!

Therefore, the difference list, and all linear function values, are use-only-once data structures, and the current implementation invalidates used linear function values (shown by the red X in the figure above) and will raise a runtime error if they are used twice. This could be precluded with a type system that ensured that values whose type was a linear function are used, computationally, in an affine way - that is, used at most once. Work on affine type systems is interesting, it's important, I've written about it before, but I see it as an orthogonal issue. That's all I really have to say about that.

And that's how we implement linear2 representational functions with constant time composition and application, which corresponds to implementing different lists with O(1) append and turn-into-a-regular-list operations.

(Disclaimer: this section is a bit of an idealization of what's actually going on in the OCaml implementation, but it's honest enough in the sense that everything's really implemented by local pointer manipulation, and represents what I think you'd expect an complied implementation of this code to do.)

### Expressiveness and efficiency

One cost of our representation strategy is that we can only use values whose type is a linear function in an affine way. Another cost is, because the implementation really thinks in terms of "the hole," we can't have two-hole structures like tree -o tree -o tree. (Edit: Jason, in the comments, points out that I may be wrong about more-than-one-hole-structures being problematic.) But, if we were interested primarily in expressiveness and were willing to sacrifice constant-time composition and application guarantees, then it would absolutely be possible to have reusable difference lists and linear types representing n-holed contexts for any n.

## Case Study: The Dutch National Flag Problem

William Lovas proposed this example, and I think it's great: the Dutch National Flag problem was proposed by Edsger Dijkstra as a generalization of what happens in a Quicksort partition step. The logic/functional programming variant of this problem is as follows: you have a list of red, white, and blue tagged numbers. You want to stably sort these to put red first, white second, and blue third: in other words, if the ML function red filters out the sublist of red-tagged numbers and so on, you want to return red list @ white list @ blue list.

That's easy: the catch is that you can't use append, because you're only allowed to iterate over the initial unpartitioned list, and you're only allowed to iterate over that list one time.

The solution in Levy# extended with difference lists is to pass along three difference lists representing the partitioned beginning of the full list and a regular list representing the un-partitioned end of the full list. At each step, you pick an item off the un-partitioned list and (constant-time) append it on to the end of the appropriate difference list. When there's nothing left in the un-partitioned list, you (constant-time) append the three difference lists while (constant-time) turning the concatenated difference lists into a normal list.
val dutch' = thunk rec loop :     (list -o list)      -> (list -o list)        -> (list -o list)          -> list            -> F list is   fun reds : (list -o list) ->   fun whites : (list -o list) ->   fun blues : (list -o list) ->   fun xs : list ->     match xs with      | Nil -> return (reds (whites (blues Nil)))      | Cons x xs ->          (match x with             | Red i ->                 force loop                   ([hole: list] reds (Cons x hole))                  whites                  blues                  xs            | White i ->                  force loop                   reds                    ([hole: list] whites (Cons x hole))                  blues                  xs            | Blue i ->                 force loop                  reds                  whites                  ([hole: list] blues (Cons x hole))                  xs) ;;
This solution is in the repository in the file dutch.levy.

## Conclusion

I think that the incorporation of linear representational functions as difference lists is beautiful. I've written plenty of Standard ML where I've done tail-recursive manipulations on lists and so either 1) had to keep track of whether I'd called List.rev the right number of times or 2) worried whether all these append operations would end up hurting efficiency. This solution gives you the append functions that you need in a way that makes it easy to think about both the order of items and the runtime cost. And it's all enabled by some very pretty types!

The real straw that broke my back and led to this line of thinking, by the way, was an issue of expressiveness, not efficiency: I was trying to reason about such code that did tail-recursive operations on lists in Agda. That's another story, but it's one that's connected to my thesis in an important way, so I will try to get around to writing it.

In the next installment of the Holey Data series, I'll show how we can pattern match against difference lists in Levy#, which turns linear representational functions from (almost) difference lists into something significantly awesomer than difference lists.

(P.S. Thanks Paul a.k.a. Goob for sneaking me into the English Department's cluster to make the illustrations for this blog post.)

1 Originally I wanted to base it off of a Noam/Dan/Bob style language based on focusing in polarized logic, but it turns out that focused logic isn't a very natural language to actually program in. I owe a debt to Andrej and Matija for going through the legwork of making CBPV into an implemented toy language that it was possible to play around with.
2 Actually, this representation strategy might work fine not just for linear functions but for regular-old LF substitution functions with zero, one, two... arguments. When I tried to work this out with William Lovas, we tentatively concluded that you might need some sort of union-find structure to make sure that all the holes were effectively pointers to "the same hole." In this note, I only motivated linear functions in terms of adequately representing difference lists; the real beauty of using difference lists starts coming into play with the next two posts in this series.
[1] Robert J. Simmons, "Difference Lists in Functional Programming", CSTheory StackExchange.
[2] Wikibook on Prolog Difference Lists.
[3] Dale Miller's implementation of (classic, logic-programmey) difference lists: "Difference Lists provide access to the tail of a list".
[4] Frank Pfenning's course notes: "Difference Lists".
[5] Sandro Etalle and Jon Mountjoy. "The Lazy Functional Side of Logic Programming".
[6] Don Stewart's Data.DList library.
[7] Yves Bekkers and Paul Tarau. "The Monads of Difference Lists".

1. Interesting stuff, rob!

Why exactly doesn't your representation strategy extend curryingly to tree -o (tree -o tree)? That type demands a "the" tree-shaped hole in tree -o tree, which in turn are the trees with a "the" hole. Uh, assuming my abuse of determiners makes any sense.

Secondly: your footnote 1 is missing its ending, wherein you probably meant to thank Andrej for Levy.

2. Jason, I think you're right about using tree -o (tree -o tree) for two-hole structures. It may just be the parser and typechecker that would need to be modified, in fact! It was a significant simplifying assumption for the parsing and typechecking parts of the implementation (as well as for the code I'm writing for steps 2 and 3) to assume that there would be at most one lambda, however, so I think I'll stick with that restriction even with the knowledge that it may be overkill.

(Fixed the footnote.)

3. That Dutch National Flag solution is fantastic.

Did you ever explain what is "almost" about these difference lists? Also, your reference 1 link is broken...

4. Chris: as you note, I left the 'why "almost"?' question as an intentional cliffhanger, it's the first thing I address in the next post.

5. Pretty cool stuff Rob! I'm not sure I agree with your use of the linear arrow for representations, though: LF's arrow isn't linear, and the linear arrow may compute in some settings. I'm also wondering about the runtime cost of your nice solution to the Dutch flag problem, trying to compare it with the linear time solution using (imperative) doubly-linked lists. Are your lists and holey lists implemented in a persistent way, or do you have to copy things?

6. Hi mrpingouin (D, I presume?)

I think you're correctly pointing out that I'm making two distinctions where I could be making four: 1) A persistent, computational arrow (I have this), 2) A linear, computational arrow (I don't have this), 3) A persistent, representational (LF) arrow (I don't have this), and 4) A linear, representational arrow (I have this).

I see this as a variant of the point Jason was making. The language in terms of expressiveness is potentially much, much richer than I seem to be giving it credit for; that expressiveness has been explored by the Noam/Bob/Dan papers, the Delphin people, and the Beluga people. I'm trying to carve out only a very tiny fragment of this language that I can give an efficient interpreter for. In the process, I'm forbidding the persistent representational arrow and restricting the use of the linear representational arrow. I'm also not using the linear* computational arrow in one place where I should be; the cost of this is that I fail to preclude certain runtime errors.

In fact, if you look at the way the Implementation section works, the way the interpreter solves the Dutch National Flag problem is more-or-less exactly what you'd program in C if I told you to use singly linked lists with a pointer to the end of the list (doubly linked lists aren't actually necessary to solve DNF). It should absolutely be O(n) (modulo GC and the fact that my tags are goddamn strings, so on and so forth.)

*You really want your computational arrows to be affine, not linear, or stuff gets really annoying; I've talked about this on the previous post Linear Types, Now!

7. Yes, I'm D ;)

About the linked lists with a pointer to the end, I'm still missing something: appending is only O(1) in an imperative language, but Levy# has persistent data, so you must be copying stuff all the time, making appending O(n), in which case the Dutch solution become O(n^2), right?

8. Yes, you are missing something, hopefully it's something contained in the section "Implementation," though - Levy# is a declarative interface to an imperative interpreter. In particular, interpeter.ml is assigning to ref cells all over the place (the particular revision of code I linked to has a bug, but suffices to show the point).

An analogy is Bob's quip that Haskell is the most sophisticated imperative programming language. The true nugget in that is how, in order to understand how some programs run with reasonable speed, you have to understand that there's an imperative graph reduction algorithm going on (call-by-need) instead of the "simple" semantics of Haskell, which is call-by-name.