Showing posts with label difference lists. Show all posts
Showing posts with label difference lists. Show all posts

Wednesday, August 24, 2011

Holey Data, Postscript: Hole Abstraction

I love the PL community on Twitter. Yesterday, David Van Horn presented this observation:

Most papers in computer science describe how their author learned what someone else already knew. -- Peter LandinTue Aug 23 13:04:05 via Buffer Favorite Retweet Reply


I'm gonna be perfectly honest: I was specifically thinking of the holey data series when I responded.

@lambda_calculus Presumably the implication is that this is bad for the advancement of science. OTOH, it's a pretty viable model for a blog.Tue Aug 23 13:33:53 via web Favorite Retweet Reply


'Cause really, I was kind of surprised that I hadn't found any previous presentations of the idea (and was therefore encouraged when Conor indicated that he at least sort of had thought of it too). Chung-chieh Shan's response was great:

@simrob @lambda_calculus I'm not sure that Landin intended to imply that it's bad!Tue Aug 23 22:29:48 via web Favorite Retweet Reply


I guess I hope he's right (or at least that I am), because in the comments to Part 3, Aaron Turon correctly observes that I missed a whopper: Minamide's 1998 POPL Paper "A Functional Representation of Data Structures with a Hole." Horray for blogs, I guess: if this has been a paper review, I would have been quite embarrassed, but as a blog comment I was delighted to find out about this related work.

A Functional Representation of Data Structures with a Hole

Minamide's paper effectively covers the same ground I covered in Part 1 of the Holey Data series: his linear representational lambdas are called hole abstractions. It's a very well written paper from the heady days when you could talk about the proposed Standard ML Basis Library and it was still common for authors to cite Wright and Felleisen when explaining that they were proving language safety by (what amounts to) proving progress and preservation lemmas.

My favorite part of reading the paper was that it simultaneously confirmed two suspicions that I picked up after a recent discussion with Dan Licata:
  1. Levy#/call-by-push-value was an unnecessarily restrictive calculus for programming with linear representational functions - ML would work just fine.
  2. By using call-by-push-value, I'd avoided certain red herrings that would have plagued the development otherwise - somewhere on almost every page I thought "yep, that explanation is less clear than it could be because they don't know about difference between value (positive) types and computation (negative) types yet."
One neat thing this paper describes that I hadn't thought about is automatically transforming non-tail-recursive programs that are naturally structurally inductive to tail-recursive programs based on hole abstractions/difference lists/linear representational functions. It seems like this optimization in particular is where many of the paper's claimed performance gains are found. It also seems like zippers-as-derivatives are pretty clearly lurking around the discussion in Section 4, which is neat.

Overall, this paper made me quite happy - it suggests there is something canonical about this idea, and proves that the idea leads to concrete performance gains. It also made me sad, of course, because it seems like the ideas therein didn't really catch on last time around. But that's part of why I didn't stop with Part 1 in the series: it's not clear, even to me, that hole abstractions/difference lists/linear representational functions are worth adding to a functional programming language if all they can do is be applied and composed. However, with the additional expressiveness that can be found in pattern matching against hole abstractions (hell, that's a way better name than "linear representational functions," I'm just going to call them hole abstractions from now on), I think there's a significantly stronger case to be made.

I've thrown a transliteration of Minamide's examples up on GitHub. The binary tree insertion example, in particular, could come from Huet's paper: it's a beautiful example of how even the "Part 1" language can implement zipper algorithms so long as you never need to move up in the tree. As for the hfun_addone function, I don't really understand it at all, I just transliterated it. In particular, it seems to not be entirely tail recursive (in particular, it seems to regular-recurse along the leftmost spine of the binary tree - if I'm not mistaken about this, I accuse line 131 of being the culprit.)

6.3 Logic Variable

(Note: I've tried to make sure this section doesn't come across as mean-spirited in any way, but I want to emphasize: this is not intended to be any sort of criticism of Yasuhiko Minamide. His paper was terrific! Go read it!)

My other favorite part of reading the paper was Section 6.3, which discusses difference lists. I plan to actually email Minamide and ask him about Section 6.3. Here's my reconstruction of events: Fictional Minamide has a great idea, implements it into a prototype ML compiler, tests it. His paper has strong theoretical and practical results, and gets accepted to POPL 1998. However, one of the reviewers says "oh, this is almost exactly difference lists in logic programming, you need to explain the connection." Fictional Minamide is not a logic programming person, has no access to a logic programming person, doesn't particularly have any reason to care about logic programming, but he does feel the need to address the concerns of a reviewer that he can't actually communicate with. Fictional Minamide manages to find, with great difficulty in the pre-Google-Scholar world, some poorly expressed explanation for what difference lists are in some random papers that are mostly about something else.1 After a couple of hours, Fictional Minamide gives up, exclaiming "okay, this makes absolutely no sense whatsoever," and writes something kinda mumbly about graph reduction that seems vaguely plausible to satisfy the reviewer's demand.

The result is a section that mostly misses the point about the connection between hole abstraction and difference lists, both of which are declarative abstractions that allow a programmer to think about working modulo an uninitialized pointer in memory (though the way Minamide and I do it, the types help you way more). This is not intended as any criticism of either Real Minamide or Fictional Minamide. Indeed, it's mostly a criticism of the conference review processes: I'm pretty sure you could find similar "Section 6.3"s in my papers as well!2 I do hope, however, that my exposition in Part 1, which was in fact motivated by difference lists and reached more-or-less the exact same setup that Minamide came up with, clarifies the record on how these two ideas are connected.

[Update Aug 25] I heard back from Real Minamide - while it was a long time ago, he did recall one of the reviewers mentioning logic variables, leading to the inclusion of that section. I win! Well, kind of. There was probably no "this makes no sense" exclamation; Minimade says that at the time his understanding at the time was in line with my comment about working modulo an uninitialized pointer. The comments about graph reduction, which were why I thought the section misses the point, were more of a side comment.

Minamide also remembered another wonderfully tantalizing tidbit: he recalls that, at POPL 1998, Phil Wadler said he'd seen a similar idea even earlier. Perhaps hole abstraction is just destined to be continuously reinvented until it finally gets included in the C++2040 standard.3


1 The work he cites is on the adding the logic programming notions of unbound variables to functional programming languages, which (while I haven't looked at them) certainly don't look they would give a good introduction to the simple-but-goofy logic programming intuitions behind difference lists.

That said, I basically have never seen a good, clear, self-contained description of what a difference list is - I consider Frank's logic programming notes to be pretty good, but I recognize that I'm not a fair judge because I was also present at the associated lecture.

2 Grep for "∇", or "nominal", especially in anything prior to my thesis proposal, if you want a head start.

3 I wonder what brackets they'll use, since as of the current standard they seem to have run out.

Monday, August 15, 2011

Holey Data, Part 2/3: Case Analysis on Linear Functions

The previous installment was titled "(Almost) Difference Lists" because there was one operation that we could (technically) do on a Prolog difference list that we can't do on the difference lists described in the previous section. If a Prolog difference list is known to be nonempty, it's possible to match against the front of the list. Noone ever does this that I can tell, because if the Prolog difference list is empty, this will mess up all the invariants of the difference list. Still, however, it's there.

We will modify our language to allow pattern matching on linear functions, which is like pattern matching on a difference list but better, because we can safely handle emptiness. The immediate application of this is that we have a list-like structure that allows constant time affix-to-the-end and remove-from-the-beginning operations: a queue! Due to the similarity with difference lists, I'll call this curious new beast a difference queue. This is all rather straightforward, except for the dicussion of coverage checking, which involves a well-understood analysis called subordination. But we'll cross that bridge when we come to it.

A pattern matching aside. Pattern matching against functions? It's certainly the case that in ML we can't pattern match against functions, but as we've already discussed towards the end of the intro to Levy#, the linear function space is not a computational function space like the one in ML, it's a representational function space like the one in LF/Twelf, a distinction that comes from Dan Licata, Bob Harper, and Noam Zeilberger [1, 2]. And we pattern match against representational functions all the time in LF/Twelf. Taking this kind of pattern matching from the logic programming world of Twelf into the functional programming world is famously tricky (leading to proposals like Beluga, Delphin, (sort of) Abella, and the aforementioned efforts of Licata et al.), but the trickiness is always from attempts to open up a (representational) function, do something computational on the inside, and then put it back together. We're not going to need to do anything like that, luckily for us.

Difference queues

We're going to implement difference queues as values of type q -o q, where q is an interesting type: because definitions are inductive, the type q actually has no inhabitants.
   data QCons: int -o q -o q ;;
An alternative would be to implement difference queues using the difference lists list -o list from the previous installment, which would work fine too.

We'll also have an option type, since de-queueing might return nothing if the queue is empty, as well as a "valof" equivalent operation to force the queue from something that may or may not be a queue. This getq option will raise a non-exhaustive match warning during coverage checking, since it can obviously raise a runtime error.
   data None: option
| Some: int -o (q -o q) -o option
;;

val getq = thunk fun opt: option ->
let (Some _ q) be opt in return q
;;
The operations to make a new queue and to push an item onto the end of the queue use the functionality that we've already presented:
   val new = thunk return [x:q] x ;;

val push = thunk
fun i: int ->
fun queue: (q -o q) ->
return [x:q] queue (QCons i x)
;;
The new functionality comes from the pop function, which matches against the front of the list. An empty queue is represented by the identity function.
   val pop = thunk fun queue: (q -o q) ->
match queue with
| [x:q] x -> return None
| [x:q] QCons i (queue' x) -> return Some i queue'
;;
Lets take a closer look at the second pattern, [x:q] QCons i (queue' x). The QCons constructor has two arguments, and because the linear variable x occurs exactly once, it has to appear in one of the two arguments. This pattern is intended to match the case where the linear variable x appears in the second argument of the constructor (read: inside of the the q part, not inside of the int part), and the pattern binds a linear function queue' that has type q -o q.

You can see how these difference queues are used in linearqueue.levy.

Another pattern matching aside. The above-mentioned patterns (and, in fact, all accepted patterns in this current extension of Levy#) actually come from the set of Linear LF terms that are in what is known as the pattern fragment (appropriately enough). The pattern fragment was first identified by Dale Miller as a way of carving out a set of unification problems on higher-order terms that could 1) always be given unitary and decidable solutions and 2) capture many of the actual unification problems that might arise in λProlog [3]. Anders Schack-Nielsen and Carsten Schürmann later generalized this to Linear LF [4], which as I've described is the language that we're essentially using to describe our data.

Coverage checking with subordination

In the previous section we saw that the two patterns [x:q] x and [x:q] QCons i (queue' x) were used to match against a value of type q -o q, and the coverage checker in Levy# accepted those two patterns as providing a complete case analysis of values of this type. But the constructor QCons has two arguments; why was the coverage checker satisfied with a case analysis that only considered the linear variable occurring in the second argument?

To understand this, consider the pattern [x:q] x and [x:q] QCons (di x) queue', where the linear variable does occur in the first argument. This pattern binds the variable di, a linear function value of type q -o int. But the only inhabitants of type int are constants, and the q must go somewhere; where can it go? It can't go anywhere! This effectively means that there are no closed values of type q -o int, so there's no need to consider what happens if the q hole appears inside of the first argument to QCons.

Because of these considerations, Levy# has to calculate what is called the subordination relation for the declared datatypes. Subordination is an analysis developed for Twelf that figures out what types of terms can appear as subterms of other types of terms. I added a new keyword to Levy# for reporting this subordination information:
   Levy> $subord ;;
Subordination for current datatypes:
int <| q
q <| q
int <| option
So int is subordinate to both q and option, and q is subordinate only to itself. Subordination is intended to be a conservative analysis, so this means that there might be values of type int -o q and int -o option and that there might be non-identity values of type q -o q, but that there are no values of type q -o option and the only value of type option -o option is [x: option] x. Levy# uses the no-overlapping-holes restriction to make subordination analysis more precise; without this restriction, a reasonable subordination analysis would likely declare q subordinate to option.1

Some more examples of subordination interacting with coverage checking can be seen in linearmatch.levy.

Subordination and identity in case analysis

We use subordination data for one other optimization. The following function is also from linearmatch.levy; it takes a value of type int -o list and discards everything until the place where the hole in the list was.
   val run = thunk rec run: (int -o list) -> F list is
fun x: (int -o list) ->
match x with
| [hole: int] Cons hole l -> return l
| [hole: int] Cons i (dx hole) -> force run dx
;;
Because Levy# is limited to depth-1 pattern matching, a pattern match should really only say that the hole is somewhere in a subterm, not that the hole is exactly at a subterm. This would indicate that the first pattern should really be [hole: int] Cons (di hole) l, but by subordination analysis, we know that int is not subordinate to int and so therefore the only value of type int -o int is the identity function, so di = [hole:int] hole and we can beta-reduce ([hole:int] hole) hole to get just hole.

So subordination is a very helpful analysis for us; it allows us to avoid writing some patterns altogether (patterns that bind variables with types that aren't inhabited) and it lets us simplify other patterns by noticing that for certain types "the hole appears somewhere in this subterm" is exactly the same statement as "this subterm is exactly the hole."

Implementation

In order to efficiently pattern match against the beginning of a list, we need to be able to rapidly tell which sub-part of a data structure the hole can be found in. This wasn't a problem for difference lists and difference queues, since subordination analysis is enough to tell us where the hole will be if it exists, but consider trees defined as follows:
   data Leaf: tree
| Node: tree -o tree -o tree
;;
If we match against a value of type tree -o tree, we need to deal with the possibility that it is the identity function, the possibility that the hole is in the left subtree, and the possibility that the hole is in the right subtree. This means that, if we wish for matching to be a constant-time operation, we also need to be able to detect whether the hole is in the left or right subtree without doing a search of the whole tree.

This is achieved by adding an extra optional field to the in-memory representation of structures, a number that indicates where the hole is. Jason Reed correctly pointed out in a comment that for the language extension described in the previous installment, there was actually no real obstacle to having the runtime handle multiple overlapping holes and types like tree -o (tree -o tree). But due to the way we're modifying the data representation to do matching, the restriction to having at most one hole at a time is now critical: the runtime stores directions to the hole at every point in the structure.

The memory representation produced by the value code [hole: tree] Node (Node Leaf Leaf) (Node (Node hole Leaf) Leaf) might look something like this, where I represent the number indicating where the hole is by circling the indicated hole in red:

If the hole is filled, the extra data (the red circles) will still exist, but the part of the runtime that does operations on normal inductively defined datatypes can just ignore the presence of this extra data. (In a full implementation of these ideas, this would likely complicate garbage collection somewhat.)

Case Study: Holey Trees and Zippers

The binary trees discussed before, augmented with integers at the leaves, are the topic of this case study. A famous data structure for functional generic programming is Huet's zipper [5, 6], which describes inside-out paths through inductive types such as trees. The idea of a zipper is that it allows a programmer to place themselves inside a tree and move up, left-down, and right-down the tree using only constant-time operations based on pointer manipulation.

The zipper for trees looks like this:
   data Top: path
| Left: path -o tree -o path
| Right: tree -o path -o path
;;
In this case study, we will show how to coerce linear functions tree -o tree into the zipper data structure path and vice versa.

In order to go from a linear function tree -o tree to a zipper path, we use a function that takes two arguments, an "outside" path and an "inside" tree -o tree. As we descend into the tree-with-a-hole by pattern matching against the linear function, we tack the subtrees that aren't on the path to the hole onto the outside path, so that in every recursive call the zipper gets bigger and the linear function gets smaller.
   val zip_of_lin = thunk rec enzip: path -> (tree -o tree) -> F path is 
fun outside: path ->
fun inside: (tree -o tree) ->
match inside with
| [hole: tree] hole ->
return outside
| [hole: tree] Node (left hole) right ->
force enzip (Left outside right) left
| [hole: tree] Node left (right hole) ->
force enzip (Right left outside) right
;;
Given this function, the obvious implementation of its inverse just does the opposite, shrinking the zipper with every recursive call and tacking the removed data onto the linear function:
   val lin_of_zip = thunk rec enlin: path -> (tree -o tree) -> F (tree -o tree) is 
fun outside: path ->
fun inside: (tree -o tree) ->
match outside with
| Top ->
return inside
| Left path right ->
force enlin path ([hole: tree] Node (inside hole) right)
| Right left path ->
force enlin path ([hole: tree] Node left (inside hole))
;;
That's the obvious implementation, where we tack things on to the outside of the linear function. Linear functions have the property, of course, that you can tack things on to the inside or the outside, which gives us the opportunity to consider another way of writing the inverse that looks more traditionally like an induction on the structure of the zipper:
   val lin_of_zip' = thunk rec enlin: path -> F (tree -o tree) is
fun path: path ->
match path with
| Top -> return [hole: tree] hole
| Left path right ->
force enlin path to lin in
return [hole: tree] lin (Node hole right)
| Right left path ->
force enlin path to lin in
return [hole: tree] lin (Node left hole)
;;
These functions, and examples of their usage, can be found in linear-vs-zipper.levy.

Foreshadowing

This installment was written quickly after the first one; I imagine there will be a bigger gap before the third installment, so I'm going to go ahead and say a bit about where I'm going with this, using the case study as motivation.

I wrote three functions:
  • lin_of_zip turns zippers into linear functions by case analyzing the zipper and tacking stuff onto the "beginning" our outside of the linear function,
  • lin_of_zip' turns zippers into linear functions by inducting over the path and tacking stuff onto the "end" or inside of the linear function, and
  • zip_of_lin turns linear functions into zippers by case analyzing the "beginning" or outside of the linear function and tacking stuff on to the zipper.
What about zip_of_lin', which turns linear functions into zippers by case analyzing the "end" or inside of the linear function? It's easy enough to describe what this function would look like:
#  val zip_of_lin' = thunk rec enzip: (tree -o tree) -> F path is 
# fun lin: (tree -o tree) ->
# match lin with
# | [hole: tree] hole -> return Top
# | [hole: tree] lin' (Node hole right) ->
# force enzip lin' to path in
# return Left path right
# | [hole: tree] lin' (Node left hole) ->
# force enzip lin' to path in
# return Right left path ;;
Toto, we're not in the pattern fragment anymore, but if we turn the representation of linear functions into doubly linked lists (or a double-ended-queues implemented as linked lists, perhaps), I believe we can implement these functions without trouble. At that point, we basically don't need the zippers anymore: instead of declaring that the derivative of a type is the type of its one-hole contexts, we can make the obvious statement that the linear function from a type to itself is the type of one-hole contexts of that type, and we can program accordingly: no new boilerplate datatype declarations needed!

Conclusion

A relatively simple modification of the runtime from the previous installment, a runtime data tag telling us where the hole is, allows us to efficiently pattern match against linear representational functions. This modification makes the use of linear representational functions far more general than just a tool for efficiently implementing a logic programming idiom of difference lists. In fact, I hope the case study gives a convincing case that these holey data structures can come close to capturing many of the idioms of generic programming, though that argument won't be fully developed until the third installment, where we move beyond patterns that come from the Miller/Schack-Nielsen/Schürmann pattern fragment.

More broadly, we have given a purely logical and declarative type system that can implement algorithms that would generally be characterized as imperative algorithms, not functional algorithms. Is it fair to call the queue representation a "functional data structure"? It is quite literally a data structure that is a function, after all! If it is (and I'm not sure it is), this would seem to challenge at least my personal understanding of what "functional data structures" and functional algorithms are in the first place.



1 This is true even though there are, in fact, no closed values of type q -o option even if we don't have the no-overlapping-holes restriction (proof left as an exercise for the reader).
[1] Daniel R. Licata, Noam Zeilberger, and Robert Harper, "Focusing on Binding and Computation," LICS 2008.
[2] Daniel R. Licata and Robert Harper, "A Universe of Binding and Computation," ICFP 2009.
[3] Dale Miller, "A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification," JLC 1(4), 1991.
[4] Anders Schack-Nielsen and Carsten Schürmann, "Pattern Unification for the Lambda Calculus with Linear and Affine Types," LFMTP 2010.
[5] Gúrard Huet, "Function Pearl: The Zipper," JFP 1997.
[6] Wikipedia: Zipper (data structure).

Friday, August 12, 2011

Holey Data, Part 1/3: (Almost) Difference Lists

This series has been brewing in my head since a year ago, starting at the 2010 Workshop on Generic Programming in Baltimore. It was at least a quarter-baked idea by the time I asked, on the CSTheory Stackexchange, a question that amounted to "are there functional programming difference lists that act like logic programming difference lists"? [1]

Difference lists

Difference lists, which are deep lore of the logic programming community, are essentially data structures that allow you to append to the beginning and the end of the list as a constant time operation; they can also be turned back into "regular" lists as a constant time operation. You can see [2,3,4] for details, but I'd almost encourage you not to: logic programming difference lists are a bit crazy.

Let me give you two ways of thinking about difference lists. If you're a lazy person (in the Haskell sense), you may say that difference lists really have something to do with laziness in functional programming languages. William Lovas showed me some crazy Haskell code yesterday that does something of this flavor, and there's a paper that espouses this viewpoint [5]. I'll stick with a different viewpoint: difference lists are functions from lists to lists. This is how difference lists are implemented in the Haskell Data.DList library [6], and you can find people that back-ported the idea that difference lists are "really" functions to higher-order logic programming languages [7].

The problem with the idea of difference lists being functions is that you lose one of the fundamental properties that you started with: while it depends on how you implement things, difference lists implemented as functions will usually have a O(1) append operation but a O(n) "turn me into a regular list" operation. For a discussions of why, see the answers to my question on CSTheory Stackexchange.

I've believed for some time that this unfortunate O(n)ness could be overcome, but I wanted a working proof-of concept implementation before I talked about it. It was during my recent visit to France that I realized that I really needed to base the proof-of-concept off of a language that was appropriately persnickety about the difference between what I called "value code" and what I called "computation code" in the previous post. And then I found Levy, and embraced and extended it into Levy#.1 So at this point you should read the previous post.

Representing datatypes

In the last post I talked about declaring and using datatypes in Levy#, using the example of lists of integers:
   data Nil: list
| Cons: int -o list -o list
It's necessary to say a few words about how these things are implemented. Every value in Levy# is represented in the OCaml interpreter as a heapdata ref - that is, a mutable pointer into a "box" in memory. Some boxes hold integers, other boxes hold the code and environments that represent suspended (thunked) computations. Declared constants from datatype declarations are another kind of box which holds both a tag (like Nil or Cons) and an appropriate number of values (that is, pointers) for the given datatype.

As an example, here's the way the list Cons 3 (Cons 7 Nil), stored in the variable xs, could be laid out in memory:


Representing difference lists

Types

On the level of types, a difference list will be represented as a member of the value type (the Linear LF type) list -o list. This adequately represents what a difference list is better than the list -> list type of Haskell or Lambda Prolog/Twelf. In Haskell, lots of functions have type list -> list, including the list reversal function. Lambda Prolog/Twelf does better; the LF type list -> list really has to be substitution function - it can only take a list and plug it into designated places in another list. But λx. Nil is a perfectly good LF value of type list -> list. In other words, it's not necessarily the case that applying something to a function is equivalent to tacking that thing on to the end of the difference list - the applied argument can simply be ignored.

Linear functions must use their argument exactly once, so linear functions list -o list accurately represent the idea of difference lists, a list with exactly one missing list somewhere in it. For lists this is kind of a dumb way to put it: the missing list has to go at the end of a series of Conses or nowhere at all, so the linearity forces there to be one (as opposed to zero) occurrences of the hole. If we were talking about terms of type tree -o tree, then linearity would enforce that there was one (as opposed to zero or two or three...) occurrences of the hole. We'll come back to that later...

Terms

Recall from the previous post that Levy# starts off syntactically precluded from forming values of this type list -o list; we'll have to fix that by adding new syntax. On the level of syntax, a difference list will be represented using a Twelf-like notation where "λ x:τ.v" is written as [x:tau] v.

Example The difference list containing first 9 and then 4 will be represented as [hole: list] Cons 9 (Cons 4 hole); let's call that difference list dxs. Let dys be the some other piece of value code with type list -o list, and let xs be the example difference list Cons 3 (Cons 7 Nil) from the previous section.

Then dxs xs appends the difference list and regular list by filling in the hole with xs; the result is Cons 9 (Cons 4 (Cons 3 (Cons 7 Nil))). Similarly, [hole: list] dxs (dys hole) is a piece of value code that forms a new difference list by appending the two smaller difference lists.

In terms of the expressive power of these difference lists, that's pretty much the story: the file linear.levy has some more examples, which can be run in the "holey-blog-1" tagged point in the repository.

A Linear Logic Aside. This, perhaps, clears up the reason I used linear implication to define constructors like Cons. Say instead of Cons we had ConsBad, a Linear LF constant of type int -> list -> list. That type is equivalent in Linear LF to a term of type !int -o !list -o list. The exclamation points ("bangs") prevent the occurrence of a linear variable in either argument to ConsBad, so [hole: list] ConsBad 9 (ConsBad 4 hole) would NOT be a (piece of value code)/(Linear LF term) of type list -o list.

Memory representation

So far, I've explained how difference lists can be represented as linear functions, but what I originally promised was difference lists with O(1) append and turn-into-a-regular-list operations. To do this, I need to explain how linear functions can be represented in memory. A linear function is represented as another kind of box in memory, one with two pointers. The first pointer acts like a regular value - it points to the beginning of a data structure. The second pointer acts like a pointer into a data structure: it locates the "hole" - the location of the linear variable. This is the in-memory representation of our example [hole: list] Cons 9 (Cons 4 hole) from above, stored as the variable dxs.

If we want to apply this linear function to a value and thereby insert a list into the hole, we merely have to follow the second hole-pointer and write to it, and then return the first value-pointer as the result. Voilà, application with only local pointer manipulation! If the result of applying xs to dxs was stored as the variable ys, the state of memory would look something like this:

The catch is that the linear function box that dxs refers to no longer can be used in the same way: if we tried to write something different to the hole pointer, disaster could result: it would change the value of ys!

Therefore, the difference list, and all linear function values, are use-only-once data structures, and the current implementation invalidates used linear function values (shown by the red X in the figure above) and will raise a runtime error if they are used twice. This could be precluded with a type system that ensured that values whose type was a linear function are used, computationally, in an affine way - that is, used at most once. Work on affine type systems is interesting, it's important, I've written about it before, but I see it as an orthogonal issue. That's all I really have to say about that.

And that's how we implement linear2 representational functions with constant time composition and application, which corresponds to implementing different lists with O(1) append and turn-into-a-regular-list operations.

(Disclaimer: this section is a bit of an idealization of what's actually going on in the OCaml implementation, but it's honest enough in the sense that everything's really implemented by local pointer manipulation, and represents what I think you'd expect an complied implementation of this code to do.)

Expressiveness and efficiency

One cost of our representation strategy is that we can only use values whose type is a linear function in an affine way. Another cost is, because the implementation really thinks in terms of "the hole," we can't have two-hole structures like tree -o tree -o tree. (Edit: Jason, in the comments, points out that I may be wrong about more-than-one-hole-structures being problematic.) But, if we were interested primarily in expressiveness and were willing to sacrifice constant-time composition and application guarantees, then it would absolutely be possible to have reusable difference lists and linear types representing n-holed contexts for any n.

Case Study: The Dutch National Flag Problem

William Lovas proposed this example, and I think it's great: the Dutch National Flag problem was proposed by Edsger Dijkstra as a generalization of what happens in a Quicksort partition step. The logic/functional programming variant of this problem is as follows: you have a list of red, white, and blue tagged numbers. You want to stably sort these to put red first, white second, and blue third: in other words, if the ML function red filters out the sublist of red-tagged numbers and so on, you want to return red list @ white list @ blue list.

That's easy: the catch is that you can't use append, because you're only allowed to iterate over the initial unpartitioned list, and you're only allowed to iterate over that list one time.

The solution in Levy# extended with difference lists is to pass along three difference lists representing the partitioned beginning of the full list and a regular list representing the un-partitioned end of the full list. At each step, you pick an item off the un-partitioned list and (constant-time) append it on to the end of the appropriate difference list. When there's nothing left in the un-partitioned list, you (constant-time) append the three difference lists while (constant-time) turning the concatenated difference lists into a normal list.
val dutch' = thunk rec loop : 
(list -o list)
-> (list -o list)
-> (list -o list)
-> list
-> F list is
fun reds : (list -o list) ->
fun whites : (list -o list) ->
fun blues : (list -o list) ->
fun xs : list ->
match xs with
| Nil -> return (reds (whites (blues Nil)))
| Cons x xs ->
(match x with
| Red i ->
force loop
([hole: list] reds (Cons x hole))
whites
blues
xs
| White i ->
force loop
reds
([hole: list] whites (Cons x hole))
blues
xs
| Blue i ->
force loop
reds
whites
([hole: list] blues (Cons x hole))
xs)
;;
This solution is in the repository in the file dutch.levy.

Conclusion

I think that the incorporation of linear representational functions as difference lists is beautiful. I've written plenty of Standard ML where I've done tail-recursive manipulations on lists and so either 1) had to keep track of whether I'd called List.rev the right number of times or 2) worried whether all these append operations would end up hurting efficiency. This solution gives you the append functions that you need in a way that makes it easy to think about both the order of items and the runtime cost. And it's all enabled by some very pretty types!

The real straw that broke my back and led to this line of thinking, by the way, was an issue of expressiveness, not efficiency: I was trying to reason about such code that did tail-recursive operations on lists in Agda. That's another story, but it's one that's connected to my thesis in an important way, so I will try to get around to writing it.

In the next installment of the Holey Data series, I'll show how we can pattern match against difference lists in Levy#, which turns linear representational functions from (almost) difference lists into something significantly awesomer than difference lists.

(P.S. Thanks Paul a.k.a. Goob for sneaking me into the English Department's cluster to make the illustrations for this blog post.)



1 Originally I wanted to base it off of a Noam/Dan/Bob style language based on focusing in polarized logic, but it turns out that focused logic isn't a very natural language to actually program in. I owe a debt to Andrej and Matija for going through the legwork of making CBPV into an implemented toy language that it was possible to play around with.
2 Actually, this representation strategy might work fine not just for linear functions but for regular-old LF substitution functions with zero, one, two... arguments. When I tried to work this out with William Lovas, we tentatively concluded that you might need some sort of union-find structure to make sure that all the holes were effectively pointers to "the same hole." In this note, I only motivated linear functions in terms of adequately representing difference lists; the real beauty of using difference lists starts coming into play with the next two posts in this series.
[1] Robert J. Simmons, "Difference Lists in Functional Programming", CSTheory StackExchange.
[2] Wikibook on Prolog Difference Lists.
[3] Dale Miller's implementation of (classic, logic-programmey) difference lists: "Difference Lists provide access to the tail of a list".
[4] Frank Pfenning's course notes: "Difference Lists".
[5] Sandro Etalle and Jon Mountjoy. "The Lazy Functional Side of Logic Programming".
[6] Don Stewart's Data.DList library.
[7] Yves Bekkers and Paul Tarau. "The Monads of Difference Lists".