Tuesday, February 8, 2011

Request for Typestate, Part 1

Part 1 of a series. If you're interested in Imaginary Future ML, a number of proposals exist: see Part 2.

Recently I have been trying to understand typestate, since a lot of my friends, hallmates, and fellow distance runners have written implementations, papers, Ph.D. theses, and SIGBOVIK presentations on the topic. A conversation with Ron Garcia helped me get the first idea of what is going on and how it relates to linear type systems, a topic I do feel like I understand.

Some background, as I understand it. The word "typestate" was introduced by Strom and Yemini way back a long time ago in 1984 in a paper entitled "Typestate: A Programming Language Concept for Enhancing Software Reliability" (Jonathan Aldritch has posted the PDF here). It was an attempt to capture the fact that function calls often have a "do this, then do that, then don't do that anymore" kind of flavor. The most obvious and simple example is one that most every programmer has encountered, thanks to C's ubiquity:



In other words, we create a pointer by mallocing it, and then we can use that pointer as long as we want to, but if we free it we can't use that pointer anymore. (Well, we can, but we're not supposed to, C sucks, whatever.) Okay, but now you're scratching your head thinking that this is basically why modern humans use garbage collection. Well, sure, but while this solves a particular problem, the problem is really a pattern of problems, and the pattern just re-asserts itself in different ways. For instance, take this interaction with SML/NJ's toplevel:
  $ sml
Standard ML of New Jersey v110.70 [built: Wed Jun 17 16:24:00 2009]
- val FILE = TextIO.openOut("foo.txt");
val FILE = - : TextIO.outstream
- TextIO.output(FILE, "Hello world\n");
val it = () : unit
- TextIO.closeOut(FILE);
val it = () : unit
- TextIO.output(FILE, "I've got a bad feeling about this\n");

uncaught exception Io [Io: output failed on "foo.txt", closed stream]
raised at: Basis/Implementation/IO/text-io-fn.sml:443.14-443.56
What's happened here is that we've run afoul of exactly the same pattern that we know from C.



Because we're in a safe language, the results are less dire; a dynamic exception gets thrown rather than memory being silently corrupted. However, the problem remains: it seems like this is the sort of thing we should be able to statically preclude (using a type system), or at least warn programmers about ahead of time (using a static analysis).

Typestate-oriented programming

Both of these approaches (type systems and static analyses) have been investigated thoroughly by my friends in Jonathan Aldrich's group, such as Kevin Bierhoff, Nels Beckman, Ciera Jaspan, Karl Naden, Roger Wolff, and Ron Garcia. The following example is a way of taking that doomed ML snippet above and making it a piece of typestate-checked code; it's mostly copied from Typestate Oriented Programming by Jonathan Aldrich, Joshua Sunshine, Darpan Saini, and Zachary Sparks and was presented at ONWARD 2009.
  state File {
public final String filename;
}

state OpenFile extends File {
private CFilePtr filePtr;
public void write(str : String) { ... }
public void close() [OpenFile>>ClosedFile]
{ ... }
}

state ClosedFile extends File {
public void open() [ClosedFile>>OpenFile]
}
The idea here is that a File is a file that might be open or closed; you can only read from an OpenFile, which is a subtype of File. That annotation on the close() and open() methods is telling us that those functions actually change the type of the object they're acting upon; the typesystem has to work accordingly to make sure you only close or read from an OpenFile, even if though, after you close it, the object has "become" exactly the thing - a ClosedFile - that you were making sure it wasn't before.

I have some particular questions about the nature of the thing called "subtyping" in typestate oriented programming that I hope to get to in a separate post, but I hope this gives a decent overview of how typestate intuitively works even if it doesn't give much intuition as to how the typing rules "actually" work. As it turns out, the type theory behind typestate-oriented programming uses ideas from linear logic, which can also be used to directly solve the motivating problem. What I really hope to explore in this series of posts is the way in which typestate - which is mostly being investigated in object oriented settings - can (or can't) be understood as a linear type discipline in a functional programming language.

Linear Types

From a programmer's perspective, linear logic is a functional programming language where all variables only get to be mentioned once. There's an additional requirement, which is that every variable must be used once, but sometimes people ignore this requirement: the system where variables are mentioned no more than once is technically an "affine" system and not a "linear" one, but it's common to call affine systems linear anyway, because the more general term is "substructural" and "substructural" is for some reason a word that people get scared by. (This turns out to be pretty problematic for me, but that's a different blog post.)

So, nothing to be scared of, can't use variables twice. Let's see what happens with our interaction with SML:
  $ smlinear
Imaginary Future Linear ML [built: IN THE FUTURE]
- val FILE = TextIO.openOut("foo.txt");
val FILE = - : TextIO.outstream
- TextIO.output(FILE, "Hello world\n");
val it = () : unit
- TextIO.closeOut(FILE);
stdIn:1.17-1.21 Error: unbound variable or constructor: FILE
Variable "FILE" was previously in scope as a linear resource,
but that resource already consumed
Oh dear, that doesn't work at all - we can't close the file or even write to it twice! This is our fault; when we ported the SML Basis Library from SML/NJ to Imaginary Future Linear ML, we needed to make some changes to the way we deal with TextIO. Here's (part of) the old signature for TextIO, with the arrows -> turned into lollipops -o to emphasize that we're in linear logic now. (Again, nothing scary, it just means that variables can only be mentioned once.)
  signature OLD_TEXT_IO = sig
type outstream
val openOut : string -o outstream
val output : outstream * string -o unit
val close : outstream -o unit
end
This signature had just the right type when we could reuse the variable FILE that we got from openOut, but now, in Imaginary Future Linear ML, we can't.

Take 1: Restoring the old behavior

It's ridiculous, however, to make the output stream go away forever whenever we write to it! So what do we do? Well, all we need is for the functions that currently return unit to give us back a new object of type outstream, and then we can use it again!
  signature NEW_TEXT_IO_1 = sig
type outstream
val openOut : string -o outstream
val output : outstream * string -o outstream
val close : outstream -o outstream
end
With this new signature for TextIO, we could have exactly the same (failed) interaction with the toplevel that we had before, the difference is that, now, we always bind a new variable when we run a function so that we have an outstream object to use in the future.
  $ smlinear
Imaginary Future Linear ML [built: IN THE FUTURE]
- val FILE1 = TextIO.openOut("foo.txt");
val FILE1 = - : TextIO.outstream
- val FILE2 = TextIO.output(FILE1, "Hello world\n");
val FILE2 = - : TextIO.outstream
- val FILE3 = TextIO.closeOut(FILE2);
val FILE3 = - : TextIO.outstream
- TextIO.output(FILE3, "I've got a bad feeling about this\n");

uncaught exception Io [Io: output failed on "foo.txt", closed stream]
raised at: Basis/Implementation/IO/text-io-fn.sml:443.14-443.56
Okay, well, why did I show an example that didn't solve the problem at all? Because it's important to realize that we haven't really lost any expressiveness in the language: even though we have a more restrictive system that prevents us from using variables twice, we can still express most of the things we could before.

(Side note: in the case of the strictly-linear variant of Gödel’s System T, it's actually been proven that we lose exactly nothing in terms of mathematical expressiveness by using the more restrictive type system, which is neat.)

Take 2: Actually fixing the problem

So, intuitively, we know what to do here: once we've closed an outstream we shouldn't be able to re-close it or output to it. The easiest solution is to make the function close have the type outstream -o unit - that way, closing an outstream consumes the variable representing the output stream and leaves us with no way to even mention it.
  signature NEW_TEXT_IO_1 = sig
type outstream
val openOut : string -o outstream
val output : outstream * string -o outstream
val close : outstream -o unit
end
This suffices to turn our original dynamic exception into a static type error like the Variable...already consumed error I made up earlier.

Take 3: Matching the typestate example

For the sake of completeness, let's think about how our example looks different from the typestate example. The big difference is that all the functions in the typestate example are void: they don't output anything, they just change the type of their argument. But I believe that this is actually something that it would be reasonable to mimic in a linear system like "Imaginary Future Linear ML." The first step is to say: well, when I consumed FILE there was no longer a FILE hanging around, so why don't I just reuse that variable name?
  $ smlinear
Imaginary Future Linear ML [built: IN THE FUTURE]
- val FILE = TextIO.openOut("foo.txt");
val FILE = - : TextIO.outstream
- val FILE = TextIO.output(FILE, "Hello world\n");
val FILE = - : TextIO.outstream
- val FILE = TextIO.closeOut(FILE);
val FILE = - : TextIO.outstream
- TextIO.output(FILE, "I've got a bad feeling about this\n");
stdIn:1.17-1.21 Error: unbound variable or constructor: FILE
Variable "FILE" was previously in scope as a linear resource,
but that resource already consumed
The second step is to consider that the typestate-oriented examples really do have "outputs" - their outputs are the new states of the this object. This would be easy syntactic sugar to whip up, just let val output : [outOpen >> outOpen] * string -o unit mean val output : outOpen * string -o outOpen and give the following signature
  signature NEW_TEXT_IO_2 = sig
openOut : string -o outOpen
output : [outOpen >> outOpen] * string -o unit
closeOut : [outOpen >> outClosed] -o unit
reopenOut : [outClosed >> outOpen] -o unit
end
With this use of syntactic sugar, we could support the following interaction with our pretend ML:
  $ smlinear
Imaginary Future Linear ML [built: IN THE FUTURE]
- val FILE = TextIO.openOut("foo.txt");
val FILE = - : TextIO.outOpen
- TextIO.output(FILE, "Hello world\n");
val it = () : unit
- TextIO.closeOut(FILE);
val it = () : unit
- TextIO.output(FILE, "I've got a bad feeling about this\n");
stdIn:6.1-6.59 Error: operator and operand don't agree [tycon mismatch]
operator domain: TextIO.openOut * string
operand: TextIO.closeOut * string
in expression:
TextIO.output (FILE,"I've got a bad feeling about this\n")
I think that's pretty nice.

Conclusion

I think the lesson of this first discussion is that, when relating linear functional programming languages to imperative programming languages, it's useful to think of object passed to functions as being "lost" to that function and then automatically "re-bound" with the same name when the function returns. Once we make this step, the notion that the re-bound variable can have a different type - which is one key insight of typestate-oriented programming - seems quite natural.

It's important to note that none of the stuff I said about linearity and state is at all novel; for further reading see Wadler's paper Linear Types Can Change The World! (warning: .ps) or David Walker's chapter "Substructural Type Systems" in Advanced Topics in Types and Programming Languages. Those two sources just about cover the background reading; if anyone has other suggestions please feel free to put them in the comments.