Monday, March 5, 2012

What does focusing tell us about language design?

This post is largely based on some conversations I've had about Polarized Logic/Focusing/Call-By-Push-Value recently, especially with Neel Krishnaswami and Wes Filardo, though it was originally prompted by Manuel Simoni some time ago.

I think that one of the key observations of focusing/CBPV is that programs are dealing with two different things - data and computation - and that we tend to get the most tripped up when we confuse the two.

  • Data is classified by data types (a.k.a. positive types). Data is defined by how it is constructed, and the way you use data is by pattern-matching against it.
  • Computation is classified by computation types (a.k.a. negative types). Computations are defined their eliminations - that is, by how they respond to signals/messages/pokes/arguments.

There are two things I want to talk about, and they're both recursive types: call-by-push-value has positive recursive types (which have the feel of inductive types and/or algebras and/or what we're used to as datatypes in functional languages) and negative recursive types (which have the feel of recursive, lazy records and/or "codata" whatever that is and/or coalgebras and/or what William Cook calls objects). Both positive and negative recursive types are treated by Paul Blain Levy in his thesis (section 5.3.2) and in the Call-By-Push Value book (section 4.2.2).

In particular, I want to claim that Call-By-Push-Value and focusing suggest two fundamental features that should be, and generally aren't (at least simultaneously) in modern programming languages:

  1. Support for structured data with rich case analysis facilities (up to and beyond what are called views)
  2. Support for recursive records and negative recursive types.

Minor addition to "core Levy"

I'll be presenting with an imaginary extension to Bauer's Levy language in this post.1 I've mucked around Levy before, of course, and it would probably help to review that previous post before reading this one. I want to make one addition to Levy before we begin making big, interesting ones. The derived form that I want to add - e1 orelse e2 - is computation code with type F bool if e1 and e1 have type F bool as well. This is definable as syntactic sugar, where x is selected to not be free in e2:

   e1 to x in 
   if x then return true 
   else e2
One other aside - do remember that, at the cost of potential confusion, I modified Bauer's original Levy implementation to make force a prefix operator that binds more tightly than application - force g z y can be written with parentheses as (((force g) z) y).

Positive types, positive recursive types

The functional languages have the construction of positive types just about right: they're made up of sums and products. I used a special LLF-ish construction for datatypes in my exploration of Levy#, but the more traditional way of introducing datatypes is to say that they are recursive types μt.T(t). The recursive types we're used to thinking about are naturally interpreted as positive types, because they are data and are inspected by pattern matching.

There's a tendency in programming language design to shove positive recursive types together with labeled sum types to get a familiar datatype mechanism.2 I will go along with this tendency and merge labeled sums and recursive types, writing them as μt.[L1: T1(t), L2: T2(t), ..., Ln: Tn(t)].

Here are datatype definitions for trees of ints and lists of ints:

   type+ tree = 
   μtree. 
   [ Leaf: int, 
     Node: tree * tree ]

   type+ list = 
   μlist. 
   [ Nil, 
     Cons: int * list ]
Note from the definition of lists that we also allow types to have no arguments: it's possible to treat the definition of Nil as syntactic sugar for Nil: unit. The associated value is Nil, which is syntactic sugar for Nil ().

There are infinitely many trees and lists as we've defined them. In fact, it's just a convention of how we have to write programs that we think of positive types as being finite sums. Even though we can't write it down as a datatype declaration, it's perfectly reasonable to think of a built-in type of infinite precision integers as being defined as follows:

   type+ int = 
   μint. 
   [ 0, 
     1,
     ~1,
     2,
     ~2,
     3,
     ... ]
The same goes for built-in strings and the like - think of built-in types being positive enumerations that were provided specially by the language since the language didn't give us the ability to write down the full declaration containing all possible constants.

Powerful views

A point that has not, in my humble opinion, been made simply enough is that the perspective of focusing says that we should think about integrating much more powerful case analysis mechanisms into our programming languages. I learned about this point from Licata, Zeilberger, and Harper's 2008 Focusing on Binding and Computation, but their setting had enough other interesting stuff going on that it probably obscured this simple point.

Standard ML (and, to the best of my knowledge, all other functional languages circa-1997) only provides a limited form of case analysis - arbitrary finite views into the outermost structure of the datatype:

   case x of
    | Leaf 9 => ...
    | Leaf y => ...
    | Node (Node (Leaf 5, y), z) => ...
    | Node (y, z) => ...
This limitation comes with a nice tradeoff, in that we can pretty effectively estimate how much work compiled code needs to do to handle a pattern match. However, the structure of focusing suggests that any mathematical function from a value's structure to computation is fair game. One well-studied modernization of pattern matching is views, which allows us to group computations in other ways. One use case would be allowing us to take an integer variable x and say that it is even or odd:
   case x of
    | 0 => return true
    | x is even => return false
    | ~1 => return false
    | x is odd => return true
The focusing-aware view of pattern matching suggests that what a pattern match is actually doing is defining a case individually for each value structure - if we could write a countably-infinite-sized program, then we could expand the view-based program above to the no-longer-view-based, but infinitely long, program below:
   case x of
    | 0 => return true
    | 1 => return true
    | ~1 => return false
    | 2 => return false
    | ~2 => return false
    | 3 => return true
    | ...
So: the focusing perspective says that any effect-free (effects include nontermination!) mathematical function we can write from value structure to computations is fair game for case analysis; views are just one known way of doing this in an interesting way. In principle, however, we can consider much richer case-analysis machinery to be fair game. For instance, there is a mathematical function \(f\) from integer values int to computation code of type F bool with the variable coll free:
  • If the integer is 1, the result is return true
  • If the integer is less than 1, the result is return false
  • Otherwise, if the integer is divisible by 2 and the result of dividing the integer by 2 is i, then the result is force coll i
  • Otherwise, let j be 3 times the integer, plus one. Then the result is force coll j
Given this mathematical function, we have the core case analysis at the heart of the Collatz function. If we expand out this case analysis into an infinitely-long mapping as before, it would look like this:
   rec coll: int -> F bool is
     fn x: int =>
       case x of
        | 0 => return false
        | 1 => return true
        | ~1 => return false
        | 2 => force coll 1
        | ~2 => return false
        | 3 => force coll 10
        | ~3 => return false
        | 4 => force coll 2
        | ~4 => return false
        | 5 => force coll 16
        | ...
The important question here is: what is the right way to think about how we write down total functions from value structures? Are views about right, or do we need something different, clearer/more/less powerful? I don't have a sense for the answers, but I'm pretty confident that's the right question.

Views are only over the structure of values

One critical caveat: I used the phrase value structure in the previous discussion repeatedly on purpose. Because computations can always contain effects, we cannot poke at the computations at all when we think about views - the value structure is just the part of the value that we can inspect without traversing a thunk. Therefore, if we have this definition...

   type+ somefn = 
   μsomefn. 
   [ BoolFn: U (bool -> F bool), 
     IntFn: U (int -> F int) ]
...then a case analysis on a value of type somefn can have at most two branches - one for the BoolFn case and one for the IntFn case. We can't pattern-match into computations, so even though it would be reasonable to say that there are many, many values of type somefn, there are only two value structures that can be inspected by case analysis.

Negative types, negative recursive types

In call-by-push-value, the computation types (a.k.a. the negative types of polarized logic/focusing) are defined by the way we use them. Computation code of computation type does stuff in response to being used - a function does stuff in response to getting an argument, and so the most well-understood negative type is implication.

Conjunction (the product or pair type) is a bit slippery, says the CBPV/focusing methodology. We've already used conjunction as a positive type (t1 * t2), but we can also think of conjunction as being a negative type. This is because it's equally valid to think of a pair being positively defined by its construction (give me two values, that's a pair) and negatively as a suspended computation that awaits you asking it for its first or second component.

CPBV/focusing makes us choose: we can have either type of conjunction, or we can have both of them, but they are different. It will be convenient if we keep the normal tuples as positive types, and associate negative types with records - that is, with named products. Records are defined by how you project from them, and just as we tie labeled sums in with positive recursive types, we'll tie labeled products in with negative recursive types.

Negative recursive types are naturally codata in the same way that positive recursive types are naturally data: the canonical example is the infinite stream.

   type- stream = 
   μstream. 
   { head: F int, 
     tail: stream }
It's natural to define particular streams with fixedpoints:
   val Ones = thunk rec this: stream is
   { head = return 1, 
     tail = force this }

   val CountUp = thunk rec this: int -> stream is
   fn x: int =>
   { head = return x, 
     tail = 
       force plus x 1 to y in
       force this y }
Say I wanted to get the fourth element of an infinite stream str of type U stream. The ML-ish way of projecting from records would write this as #tail (#head (#head (#head (force str)))), but I will pointedly not use that notation in favor of a different record notation: (force str).head.head.head.tail. It nests better, and works better in the case that record elements are functions. (Because functions are negative types, like records, it is very natural to have functions be the arguments to records.)

Negative recursive types as Cook's objects

Here's a bit of a case study to conclude. The entities that William Cook names "Objects" in his paper On Understanding Data Abstraction, Revisited are recognizably negative recursive types in the sense above.3 Cook's examples can be coded in Standard ML (see here), but the language gets in the way of this encoding in a number of places.4 To see what this would look like in a language with better support, I'll encode the examples from Cook's paper in my imaginary CBPV implementation. The recursive negative type looks much like a stream, and encodes the interface for these set objects.

   type- iset = 
   μiset. 
   { isEmpty: F bool, 
     contains: int -> F bool, 
     insert: int -> iset, 
     union: U iset -> iset }
One difficulty: the Insert and Union operations that Cook uses involve mutual recursion. I don't have mutual recursion, so rather than pretending to have it (I've abused my imaginary language extension enough already) I'll code it up using records. The record type insertunion I define isn't actually recursive, it's just a record - this is analogous to using ML's datatype mechanism (which, as we've discussed, makes positive recursive types) to define a simple enumeration.
   type- insertunion = 
   μinsertunion.
   { Insert: U iset * int -> iset,
     Union: U iset * U iset -> iset }

   val InsertUnion = thunk
     rec x: insertunion is
     { Insert = fn (s, n) =>
       (force s).contains n to b in
       if b then force s else 
       rec self: iset is
       { isEmpty = return false,
         contains = fn i => (force equal i n) orelse (force s).contains i,
         insert = fn i => (force x).Insert (self, i),
         union = fn s' => (force x).Union (self, s') },

       Union = fn (s1, s2) => 
       rec self: iset is
       { isEmpty = (force s1).isEmpty orelse (force s2).isEmpty,
         contains = fn i => (force s1).contains i orelse (force s2).contains i,
         insert = fn i => (force x).Insert (self, i),
         union = fn s' => (force x).Union (self, s') } }

   val Insert: U (U iset * int -> iset) = thunk ((force InsertUnion).Insert)

   val Union: U (U iset * U iset -> iset) = thunk ((force InsertUnion).Union)
We've defined union of sets and insertion into a set, but we haven't actually defined any sets yet! Once we give the implementation of an empty set, however, we can manipulate these sets with a Java-esque method invocation style:
   val Empty = thunk rec self: iset is
   { isEmpty = return true,
     contains = fn x => return false,
     insert = fn i => force Insert (self, i),
     union = fn s' => force s' }

   val JustOne = thunk (force Empty).insert 1
   (force Empty).insert(3).union(JustOne).insert(5).contains(4)
As Cook noted, it's also very natural to talk about infinite sets in this style, such as the set of all even numbers and the set of all integers:
   val Evens = thunk rec self: iset is 
   { isEmpty = return false,
     contains = fn i => 
       force mod i 2 to x in
       (case x of 
         | 0 => return true
         | _ => return false),
     insert = fn i => force Insert (self, i),
     union = fn s' => force Union (self, s') }

   val Full = thunk rec self: iset is
   { isEmpty = return false,
     contains = fn i => return true,
     insert = fn i => force self,
     union = fn s' => force self }


1 If anyone is interested in helping me implement a toy language along these lines, I'm all ears. Also, the disclaimer that since I don't have an implementation there are surely bugs bugs bugs everywhere is strongly in effect.

2 As a practical matter, this makes it easier for the language to know where to put implicit roll/unroll annotations, so that programmers doesn't have to write these explicitly, which would be a pain.

3 Cook does give the disclaimer that this definition is "recognized as valid by experts in the field, although there certainly are other valid models." I carefully call these things "Cook objects" rather than "objects" - my claim is that negative recursive types correspond to what William Cook names objects, not that they correspond to what you call objects. Note, furthermore, that I could be wrong about what William Cook calls objects! I only have his paper to go on and I have been known to be a few cookies short of a haystack.

4 There are two issues with the encoding in SML. The most basic is that we have to encode the recursive types using SML's recursive type mechanism, which is biased towards the positive encoding of recursive types as finite sums. The other issue is that the way SML ties together (mutual) recursion with closures gets in the way - the standard way of writing a "thunk" as a function from unit is all over the place, and it wan't possible to define EmptySet so that it was mutually recursive with the definition of Insert and Union as a result. So, I'm certainly not arguing that SML facilities programming in (what Cook calls) an object-oriented style in a strong sense - it's unnecessarily painful to code in that style - but the minimal semantic building blocks of what Cook presented and named Object are present and accounted for. [Update:] As gasche discusses in the comments, Ocaml's records, and the way it deals with laziness, make it quite a bit better at encoding the example, but it's still not ideal.

Thursday, January 12, 2012

Structural focalization updated

I've uploaded to both ArXiV and my webpage a significantly revised draft of the paper Structural focalization, which I've spoken about here before. Feedback is welcome!

One of the points I make about the structural focalization technique is that, because it is all so nicely structurally inductive, it can be formalized in Twelf. As part of a separate project, I've now also repeated the whole structural focalization development in Agda! The code is available from GitHub. While a structural focalization proof has some more moving parts than a simple cut-and-identity proof, it also has one significant advantage over every Agda proof of cut admissibility that I'm aware of: it requires no extra structural metrics beyond normal structural induction! (My favorite structural metric is the totally nameless representation, but there are other ways of threading that needle, including, presumably, these "sized types" that everyone seems to talk about.)

In regular, natural-deduction substitution, you can get away without structural metrics by proving the statement that if \(\Gamma \vdash A\) and \(\Gamma, A, \Gamma' \vdash C\) then \(\Gamma, \Gamma' \vdash C\); the extra "slack" given by \(\Gamma'\) means that you operate by structural induction on the second given derivation without ever needing to apply weakening or exchange. Most cut-elimination proofs are structured in such a way that you have to apply left commutative and right commutative cuts on both of the given derivations, making this process tricky at the best; I've never gotten it to work at all, but you might be able to do something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A, \Gamma'' \longrightarrow C\) then \(\Gamma, \Gamma', \Gamma'' \longrightarrow C\)." If someone can make this work let me know!

A focused sequent calculus, on the other hand, has three separate phases of substitution. The first phase is principal substitution, where the type gets smaller and you can do whatever you want to the derivations, including weakening them. The second phase is rightist substitution, which acts much like natural-deduction substitution, and where you can similarly get away with adding "slack" to the second derivation. The third phase is leftist substitution, and you can get by in this phase by adding "slack" to the first derivation: the leftist cases read something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A \longrightarrow C\) then \(\Gamma, \Gamma' \longrightarrow C\)."

In Structural focalization, I note that the structural focalization technique could be seen as a really freaking complicated way of proving the cut and identity for an unfocused sequent calculus. But in Agda, there's a reason you might actually want to do things the "long way" - not only do you have something better when you finish (a focalization result), but you get cut and identity without needing an annoying structural metric.

Friday, January 6, 2012

Response from ACM's Scott Delman

In the comments to my last post ("Why does the ACM act against the interests of scholars?") ACM's Director of Group Publishing, Scott Delman, left a multiple-comment response. It's a response both to the views I expressed and to the views of others that I summarized. He agreed to have his comments posted as a post here; I'll leave my own thoughts for a separate post or the comments.

Ok, here's the other side of this, which I feel compelled to throw out there after reading Rob's post and a few of the related comments.

Like most things in life, things are not always as black and white as some would lead us to believe. In this case, I think there is a basic misunderstanding of the ACM and the AAP (which is incidentally an organization that does a great deal of good work on behalf of both publishers and the scientific community).

Let's start with the ACM....which is a non-profit organization founded in 1947 by members of the computing community with the primary mission of advancing the field of computing. The Association is organized as a 501(c)3 corporation with daily operations run by a small staff of approximately 75 individuals who ultimately take their direction from a volunteer leadership of hundreds of dedicated scientists, scholars, educators, practitioners, and students who graciously donate a significant amount of their time to direct the Association forward in a way that benefits the computing community as a whole. It is important to point this out, because there is an implication in the original post that the ACM is an entity that is in some way acting against the scholarly community, when in fact the ACM is an organization that is literally run by the scholarly community.

Keeping this in mind, we are either left with a situation in which the scholarly community is either acting against itself by the policies it sets and supports (such as ACM's copyright policy and ACM's subscription model) or something else is going on here. Since it doesn't seem logical or even practical that the top-decision makers at ACM (such as the ACM Publications Board of Volunteers or the ACM Executive Committee of Volunteers, who oversee all major strategic decisions of the Association) would support policies that actively work against the interests of their own community, I think it is more reasonable to suggest that what is going on here is that the issues are not as cut and dry or as simplified as some advocates of "immediate and unrestricted" open access to all scholarly literature would lead us to believe.

Whenever I discuss the topic of open access with colleagues and friends, I think it is useful to try to imagine what the world would look like if the US Federal Government or other Foreign Governments decided to pass legislation that required all scholarly material that is in some way supported by public funding be made instantly open and freely available to the world without any paywalls of any sort. Well, as ACM's publisher and someone who is intimately aware of the tangible costs of publishing and disseminating high quality scholarly literature, I can tell you without a shadow of a doubt that the end result of this sort of legislation would be catastrophic for the scientific community and scholarly publishers alike. If in a blink of an eye, organizations like ACM were required to simply open up our archive of articles (the ACM DL) without the ability to recoup the costs of publishing and disseminating those articles (or all of the technically sophisticated services built around that content inside the ACM DL), ACM would cease to be the sustainable organization it is today and would eventually shutter its doors at some point in the future, instead of continuing to be the sustainable force for good that it is today. If this sounds like PR-dribble, I apologize, but I really do believe this!

What's more, the senior volunteers who are most familiar with ACM's activities and who sit on ACM's various committees and boards recognize and understand the tradeoffs that are necessary to maintain a sustainable organization. Over the past few years, I have participated in meetings with our Publications Board, which is the governing body for publications related strategy and decisions at ACM, where the issues of open access and alternative business models have been repeatedly discussed, and when all of the facts have been taken into consideration it has been overwhelmingly clear to these members of the community that ACM's approach is in the best longterm interests of the scholarly community. In fact, the ACM Author-Izer service, which is written about in the above post, was conceptualized at one of these meetings as the result of an in-depth discussion about how to balance the legitimate need of our authors to make the "archival versions" of their articles openly available while at the same time preserving the revenue stream that ACM relies heavily on to do its good work. ACM's pre-existing copyright policy already addressed the issue of posting "accepted versions" of an author's work, but ACM's volunteers decided that it was even more beneficial for the community if the "archival versions" could be made available from the author's site using the "Author-Izer" perpetual link. In general, while Author-Izer is still relatively new, the initial responses have been extremely positive and there is widespread recognition (including Rob's above) that this is a step in the right direction....

Let me briefly address the "opposing views" raised in Rob's post. First, in an instance where an author graduates, moves, or retires, it is always possible for the initial link to be replaced by a more up-to-date link. The ability to manage the URL that hosts the link is in the hands of the author, so I don't see a significant issue here and at the very least the effort on behalf of the author is no greater (and perhaps significantly less) than it would be to move their vitae or "pre-published" articles to a new website. What's more, ACM has simplified this process for authors and eliminated the confusion that is caused by having "multiple versions" of articles available on multiple websites by creating a centralized place (their ACM Author's Page, which includes all of their ACM and non-ACM publications) from which authors can update their Author-Izer links. By hosting the archival version of the article on a single and "sustainable" site, we strongly believe this is a better solution for the community.

In relation to argument from Russell O'Connor, I reject the plausibility or even the possibility that the ACM might "go evil" for the reasons I've outlined above. Since ACM ultimately carries out the well thought out wishes of the community itself since the decision makers are the very members of the community who are impacted by those decisions, it is just not possible for such a scenario to occur. Bankrupt is another story, since it is always impossible to predict how an organization's finances will be managed in the future, even though for the record it is exactly the kind of decision making I've mentioned above that currently keeps the ACM is a very strong position. Nevertheless, contingencies are in place for this unlikely scenario, as it relates to ACM's publications and all articles in the ACM Digital Library. Several years ago, ACM established partnerships with two very well established organizations (CLOCKSS & Portico) to ensure that ACM's publications would be preserved and made available to the scientific community (at no cost) in the unlikely event that ACM ceased to exist. [Rob's note: here's a press release about that.] Both organizations take different approaches to longterm digital preservation, but both are non-profits that exist for the sole purpose of maintaining a longterm perpetual archive for the scholarly community and nearly all major scientific publishers participate in one or both of these initiatives. ACM participates in both to provide an even higher level of redundancy than most other publishers. So, it is not clear what would happen to Author-Izer in the event of this doom-day scenario, but what is for certain is that ACM's archive would be made available to the scholarly community in any event.

Lastly, it is worth noting that the AAP is one of the publishing industries' primary advocates and they do an enormous amount of good work. Rather than deriding this organization that supports and protects the interests of over 300 well established publishers, including ACM, I would suggest that we focus on the spirit of what the Research Works Act represents, which is to limit the ability of the federal government to mandate decisions that would almost certainly have a longterm catastrophic impact on an industry that partners with and supports (and in our case is one and the same) the scientific community.

Respectfully,

Scott Delman
Director of Group Publishing
Assoc. Computing Machinery

Thursday, January 5, 2012

Why does the ACM act against the interests of scholars?

[Updated Jan 6, Jan 7] Some stuff has been happening! I'm delighted by two developments. First, ACM's Director of Group Publishing, Scott Delman, wrote a series of comments that is now one big post: Response from ACM's Scott Delman. Second, I've observed that many other people came to the same conclusion I did - that it's time for our professional organizations to leave the Association of American Publishers. The reason I brought up ACM Author-izer was to argue that Author-izer makes sense only insofar as the CS community trusts their professional organization; I remain of the view that membership in the AAP is incompatible with this trust. Here's Cameron Neylon saying that a bit more forcefully, and here's John Dupuis, who is also compiling a list of all the things related to RWA. (Did I mention the AAP also supports SOPA? Yep, awesome.)


I've got two logic posts queued up in the To Do list. But, dear the internet, we need to talk about the ACM. TL;DR is bold and italics at the bottom.

My friend Glenn Willen tweeted something about the Research Works Act last night. [Update: you can read the very short bill here] Basically, it would (among other things) require that the government can't demand that publishers of federally-funded research make their research available to the public. You should really read the press release; it just a wonderful example of the "dripping in PR" genre of literature.

This is not shocking. Awful legislation gets introduced all the time with names ("Research Works Act") that do the opposite of what their title suggests (preventing research from working and acting, wildly attempting to maintain an ultimately unsustainable status quo). Frankly, I expect publishers to behave this way, and I expect there to be the usual variety of opinions about it. But then I ran through the members of the Association of American Publishers, the group which is cheering this legislation that the (presumably) they wrote, hoping against hope. I was unsurprised but a bit sickened by what I saw: the Association for Computing Machinery is a member of the AAP.

I like the ACM, I am proud of my membership in the ACM and ACM SIGPLAN, the Special Interest Group on Programming Languages. I personally think that the ACM's republication policies have been pretty reasonable during the time I've inhabited this academic world. I'm also proud of my involvement with ACM through their student magazine, XRDS. I write profiles of fascinating people, all of which are available on my personal webpage, for free, through the ACM Author-izer service.

Let's talk about that Author-izer

When I publish anything through the ACM, they own the copyright. This is totally fine-by-me for things I write for XRDS (when I worked for the Daily Princetonian in college they owned copyright on my work for them as well). In my mind, it's a little more complicated when I publish stuff I wrote in an ACM conference proceedings. I want to make sure people have access to that research-ey material - my reputation and career opportunities depend on people finding, reading, and liking my work, but in ACM's Digital Library it's behind a "paywall," accessible only to ACM members and people on university networks. The ACM (unlike IEEE) provides a couple of different-sized hoops that you can jump through to provide free access to your work from your personal home page; Author-izer is the newest of these.

On a technical level, ACM Author-izer lets you, the author of a work that the ACM now has copyright to, bless a particular URL on the internet (presumably your personal home page). The ACM then gives you a special link to their Digital Library - if you're coming from the blessed URL to the special link, you get access to the research. It sounds a little goofy but it works for me in practice and I'm cautiously pleased with it. (Here's Andrew Appel talking about Author-izer if you'd like a concurring opinion.)

But there's another view that Author-izer is a step backwards - because moving a web page (upon graduation or retirement) breaks the functionality of Author-izer links, ACM gets, in the long run, more exclusive content than if people were posting semi-definitive versions of papers on their web page. This is not a crazy concern, but I feel like lots of universities archive alumni's pages in-place, so I also don't feel too worried about it.

There's another argument I've read (UPDATE: from Russell O’Connor, I'd forgotten the link but Joshua Dunfeld reminded me in the comments). It's plausible, more insidious, and more long-term. The ACM might "go evil" in some way, sure, but even positing that the ACM is and will remain reasonably virtuous, what if the ACM goes bankrupt? In the bankruptcy proceedings, some copyright trolls get the rights to everything in the digital library, immediately shut down Author-izer, and start wreaking havoc on academia (threatening lawsuits and demanding money who posted ACM-published works to their webpage) because they're copyright trolls and that's how they roll. A lot of people are violating the letter of their agreements when they post work to their web pages - you're not allowed to post the ACM's PDF, and in fact my reading of the agreement is that you have to change the ACM copyright notice in your version of the paper to a specific other thing; most people don't do this. Of course the ACM-provided LaTeX class doesn't support this, so you have to go diddling around with .cls flies to produce a PDF that looks like this - see the lower-left-hand corner of the first page. Because people are less likely to jump (correctly) through the "author's version" hoop, instead relying on Author-izer, in this hypothetical narrative the ACM's policies have, indeed, worked against the interests of ACM's members.

What does this have to do with the "Research Works Act" breaking research?

My view of Author-izer is that it requires a high level of trust: trust that the ACM will continue supporting authors, and that we'll be able to continue supporting the ACM (since if we don't or can't support the ACM, it will go bankrupt and be taken over by copyright trolls). I can overlook little things where the ACM is not acting in the interest of its members (why doesn't the standard .cls make it easy to make an authors version?) because the world isn't perfect.

Furthermore, with experiments like Author-izer, I believe that ACM has demonstrated that it's trying to do the right thing, as opposed to IEEE, which doesn't give authors hoops to jump through to legally post their work to their webpages. (You should read IEEE's hilariously awful responses to Matt Blaze on this issue. Blaze, I should add, sees much less difference between ACM and IEEE than I do.)

However, the "Research Works Act" makes it clear that ACM's membership in the Association of American Publishers is an egregious and unacceptable instance of working against the interest of scholars and ACM members. We should be thinking about how to demand that our professional organization, the Association for Computing Machinery, do two things: 1) withdraw from the Association of American Publishers 2) take the clear position that the so-called "Research Works Act" is an unacceptable piece of legislation that is not supported by the computer science community.

We should do this even though (I suspect) the primary target of this ridiculous act is medical science. (At the present time, the NIH admirably puts a higher priority on open dissemination of research than the NSF.)


P.S. added January 7: The title of this post is intentionally re-asking the question Blaze asks in his aforementioned post, "Why do IEEE and ACM act against the interests of scholars?" I am focusing on the ACM simply because I care more about the ACM. The IEEE is also a member organization of the AAP.

Saturday, December 17, 2011

Notes on classical sequent calculi (1/2)

These are some notes I made to try to help me understand Noam's focused presentation of classical logic in Polarity and the Logic of Delimited Continuations. I hope these notes coud be useful to others.

Sequent presentations of classical logic

I write, inevitably, from the perspective of an intuitionistic proof theorist, so let's start there. In intuitionistic logics, sequents have the form \(\Gamma \vdash A~\mathit{true}\), where the hypothetical context \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\). The whole sequent \(\Gamma \vdash A~\mathit{true}\) is read as "assuming the truth of all the things in \(\Gamma\), we know \(A~\mathit{true}\)." Let's look at a couple of ways of presenting sequent calculi for classical logic.

Two-sided judgmental classical sequent calculi

One way of presenting a classical sequent calculus is to give a two-sided sequent, \(\Gamma \vdash \Delta\). As before, \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\), but \(\Delta\) has the form \(B_1~\mathit{false}, \ldots, A_m~\mathit{false}\), and the whole sequent is read as "taken together, if all the things judged true in \(\Gamma\) are true and all the things judged false in \(\Delta\) are false, then there is a contradiction."

The language of propositions is \(A ::= P \mid \neg A \mid A \wedge B \mid A \vee B\) - uninterpreted atomic propositions \(P\), negation, conjunction ("and"), and disjunction ("or"). I'm leaving out truth \(\top\) and falsehood \(\bot\) because they're boring. The rules for these two-sided classical sequent calculi look like this:

\[ \infer {\Gamma, ~ P~\mathit{true} \vdash \Delta, ~ P~\mathit{false}} {} \] \[ \infer {\Gamma \vdash \Delta, ~ \neg A~\mathit{false}} {\Gamma, ~ A~\mathit{true} \vdash \Delta} \qquad \infer {\Gamma, ~ \neg A~\mathit{true} \vdash \Delta} {\Gamma \vdash \Delta, ~ A~\mathit{false}} \] \[ \infer {\Gamma \vdash \Delta, ~ A \wedge B~\mathit{false}} {\Gamma \vdash \Delta, ~ A~\mathit{false} &\Gamma \vdash \Delta, ~ B~\mathit{false}} \qquad \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \Delta} \] \[ \infer {\Gamma \vdash \Delta, ~ A \vee B~\mathit{false}} {\Gamma \vdash \Delta, ~ A~\mathit{false}, ~ B~\mathit{false}} \qquad \infer {\Gamma, ~ A \vee B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true} \vdash \Delta &\Gamma, ~ B~\mathit{true} \vdash \Delta} \]

Two asides. First, in presentations that do not emphasize the fact that \(A_i~\mathit{false}\) and \(B_j~\mathit{true}\) are judgments and not propositions, there is another reading of the two-sided sequent \[A_1,\ldots,A_n \vdash B_1,\ldots,B_m\] This interpretation is that the truth of all of the \(A_i\) implies the truth of one of the \(B_j\) - this reading suggests a reading of any intuitionistic sequent proof as a classical sequent proof with one conclusion. You should convince yourself that this interpretation is equivalent to the interpretation above (hint: it's just a mode of use of De Morgan's laws).

Second aside: your rules may differ. I'm using a style of presentation where every connective is broken down by a unique connective and, from the perspective of bottom-up proof search, it's never a mistake to apply any rule, because the conclusion implies all of the premises (a property called invertibility). The "true" (or left) rule for conjunction (that is, "and" or \(\wedge\)) and the "false" (or right) rule for disjunction (that is, "or" or \(\vee\)) both have a different, non-invertible presentation. In the case of conjunction, it's this pair of rules: \[ \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true} \vdash \Delta} \qquad \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ B~\mathit{true} \vdash \Delta} \] You could "make a mistake" applying these rules in bottom-up proof search: just because there is a proof of \(\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta\) does not mean that there is a proof of \(\Gamma, ~ A~\mathit{true} \vdash \Delta\).

One-sided judgmental sequent sequent calculi

Of course, hypotheses are just hypotheses, there's no a priori reason why we need to separate the true ones and the false ones into separate contexts. Let's use a unified context and call it \(\Psi\). \[\Psi ::= \cdot \mid \Psi, A~\mathit{true} \mid \Psi, A~\mathit{false}\] Then, we can have the sequent form \(\Psi \vdash \#\), which we read as "all the assumptions in \(\Psi\) together imply a contradiction" - we pronounce \(\#\) as "contradiction." We'll need rewrite all of our rules:

\[ \infer {\Psi, ~ P~\mathit{true}, ~ P~\mathit{false} \vdash \#} {} \] \[ \infer {\Psi, ~ \neg A~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ \neg A~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{false} \vdash \#} \] \[ \infer {\Psi, ~ A \wedge B~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{false} \vdash \# &\Psi, ~ B~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A \wedge B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#} \] \[ \infer {\Psi, ~ A \vee B~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{false}, ~ B~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A \vee B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \# &\Psi, ~ B~\mathit{true} \vdash \#} \]

Hopefully you'll agree that this is "obviously the same" as the first presentation.

One-sided, truth-oriented sequent calculi

But wait! The "false" rule for conjunction looks just like the "true" rule for disjunction, and the "true" rule for conjunction looks just like the "false" rules for disjunction. Can we simplify these rules?

The usual answer is that you can, indeed, do with fewer rules and without a false judgment at all. However, we need two twists to deal with the rules that involved both the true and false judgments. First, we need to let every atomic proposition come in two flavors, the "regular" flavor \(P\) and the "negated" flavor \(\overline{P}\). Then, the rule dealing with atomic propositions looks like this: \[ \infer {\Gamma, ~ P~\mathit{true}, ~ \overline{P}~\mathit{true} \vdash \#} {} \] Second, instead of negation being a proposition \(\neg A\), we define a negation function, which I will write as \((A)^\bot\) to distinguish it from the propositional negation \(\neg A\). W. The negation function is defined as follows: \[ \begin{align} {(P)^\bot} = & \overline{P}\\ {(\overline{P})^\bot} = & P\\ {(A \wedge B)^\bot} = & {(A)^\bot} \vee {(B)^\bot}\\ {(A \vee B)^\bot} = & {(A)^\bot} \wedge {(B)^\bot}\\ \end{align} \] With this definition, we can eliminate the negation proposition altogether - the negation function just applies De Morgan laws all the way down to atomic propositions. We now get our sequent calculus for "half off" - there's no more official negation, and we don't need the false judgment at all anymore. We only need two more rules (for a total of three)!

\[ \infer {\Psi, ~ A \wedge B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ A \vee B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \# &\Psi, ~ B~\mathit{true} \vdash \#} \]

It would also be possible to play this game the other way around: gather everything on the right-hand side, bias the whole thing towards the "false" judgment, and basically get the "other half" of the two-sided sequent calculi. This ability to play the game equally well either way is part of what people mean when they say that classical logic is "very symmetric."

However, given that it's all the same, why not reason about truth and not falsehood? I've never understand why classical linear logic (in particular) always seems to bias itself towards one-sided sequent calculi on the right. There are important differences in what it means to think like a classical linear logician and what it means to think like an intuitionistic linear logician, but I really think that it unnecessarily exacerbates this divide when we have to turn all of our \(\oplus\)es to \(\&\)s and \(\otimes\)es to pars in order to talk to one another.

Polarized presentations of classical logic

Now for the real purpose of this note: writing out the review of classical logic that Noam gives in "Polarity and the Logic of Delimited Continuations." This discussion is a synthesis of that presentation and a little bit of "On the unity of duality."

Two for the price of two

Fundamentally, the observation Noam is making is that the one-sided truth-oriented sequent calculus goes too far - really, there are two kinds of disjunction, and two kinds of conjunction, which is why it seemed like the original calculus seemed to have redundancies. The third system above (the one-sided, truth-oriented sequent calculus) made it look like we were getting our logic for "half-off" - but really that's because the first two judgmental presentations were defining twice as many connectives as appeared to the naked eye. (As an aside, if you study classical linear logic, you're forced into the same conclusion for different reasons.)

Jason Reed taught me that, if you have two different judgments in a logic, it's worth seeing what happens if you syntactically differentiate the things you judge to be true and the things you judge to be false. Let's go ahead and "guess the right answer" - I'm going to call the things we judge to be true positive, and that the things we judge to be false negative. There's more than one of everything! \[ \begin{align} A^- = & \neg^- A^+ \mid P^- \mid A^- \wedge^- B^- \mid A^- \vee^- B^-\\ A^+ = & \neg^+ A^- \mid P^+ \mid A^+ \wedge^+ B^+ \mid A^+ \vee^+ B^+ \end{align} \] Here are a bunch of rules: note that the fact that the two negations change the polarity of the propositions; the rules make it evident that this is the right thing to do, because we have (for example) \(\neg^+ A^- ~\mathit{true}\) but \(A^- ~\mathit{false}\):

\[ \infer {\Psi, ~ \neg^- A^+~\mathit{false} \vdash \#} {\Psi, ~ A^+~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ \neg^+ A^-~\mathit{true} \vdash \#} {\Psi, ~ A^-~\mathit{false} \vdash \#} \] \[ \infer {\Psi, ~ A^- \wedge^- B^-~\mathit{false} \vdash \#} {\Psi, ~ A^-~\mathit{false} \vdash \# &\Psi, ~ B^-~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A^+ \wedge^+ B^+~\mathit{true} \vdash \#} {\Psi, ~ A^+~\mathit{true}, ~ B^+~\mathit{true} \vdash \#} \] \[ \infer {\Psi, ~ A^- \vee^- B^-~\mathit{false} \vdash \#} {\Psi, ~ A^-~\mathit{false}, ~ B^-~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A^+ \vee^+ B^+~\mathit{true} \vdash \#} {\Psi, ~ A^+~\mathit{true} \vdash \# &\Psi, ~ B^+~\mathit{true} \vdash \#} \]

So, are we good? Well, no, not really. The problem is that the "+" or "-" stuck to an atomic proposition isn't an annotation or modifier the way the overbar was in the one-sided, truth-oriented sequent calculus above. \(P^+\) and \(P^-\) are different atomic propositions, and it wouldn't be right to given an inference rule that had, as its conclusion, \(\Psi, ~ P^+~\mathit{true}, ~ P^-\mathit{false}\). Why? Well, for now let's go with "because I said so." The argument I have for this point isn't bulletproof, and it has to do with the role of atomic propositions as stand-ins for other propositions.

However, if you accept my argument from authority, we are left no way to prove, or even to state, anything equivalent to the classical \(P \vee \neg P\) into polarized logic, since any way we try to polarize this formula will lead to \(P\) needing to be both positive and negative. We're going to need some way, different from negation, of including positive propositions in negative ones.

These inclusions of positive propositions into negative ones (and vice versa) are called shifts - \({\downarrow}A^-\) is a positive proposition and \({\uparrow}A^+\) is a negative proposition. We could just add these two rules and call it a day... \[ \infer {\Psi, ~ P^+~\mathit{true}, ~ {\uparrow}P^+~\mathit{false} \vdash \#} {} \qquad \infer {\Psi, ~ P^-~\mathit{false}, ~ {\downarrow}P^-~\mathit{true} \vdash \#} {} \] ...but this is hardly general: the rules above should be derivable; this rule should be derivable as well: \[ \infer {\Psi, ~ P^+~\mathit{true}, ~{\uparrow}(P^+ \vee^+ Q^+)~\mathit{false} \vdash \#} {} \] All three of these derivable rules share a common property: in an on-paper proof, we would say that the contradiction is "trivial." The hypothesis \({\uparrow}P^+~\mathit{false}\) is trivial due to the fact that \(P^+\) is true by a different hypothesis, and because the truth of \(P^+\) allows us to trivially conclude that \(P^+ \vee^+ Q^+\) is true, \(P^+ \vee^+ Q^+\) is trivially contradictory as well.

This idea is encoded in two rules which capture proof-by-contradiction. One way we establish a contradiction is by showing that \(A^+\) is both false (by assumption) and trivial (by direct proof). The other way we establish a contradiction is by showing that \(A^-\) is both true (by assumption) and false (by direct proof of absurdity). These are embodied in the following two rules:

\[ \infer {\Psi \vdash \#} {{\uparrow}A^+~\mathit{false} \in \Psi &\Psi \vdash A^+~\mathit{trivial}} \qquad \infer {\Psi \vdash \#} {{\downarrow}A^-~\mathit{true} \in \Psi &\Psi \vdash A^-~\mathit{absurd}} \]

Now, of course, we need to give a bunch more rules to describe how to prove positive propositions trivial and negative propositions absurd.

\[ \infer {\Psi, ~ P^+~\mathit{true} \vdash P^+~\mathit{trivial}} {} \qquad \infer {\Psi, ~ P^-~\mathit{false} \vdash P^-~\mathit{absurd}} {} \] \[ \infer {\Psi \vdash \neg^+ A^-~\mathit{trivial}} {\Psi \vdash A^-~\mathit{absurd}} \qquad \infer {\Psi \vdash \neg^- A^+~\mathit{absurd}} {\Psi \vdash A^+~\mathit{trivial}} \] \[ \infer {\Psi \vdash A^+ \wedge^+ B^+~\mathit{trivial}} {\Psi \vdash A^+~\mathit{trivial} &\Psi \vdash B^+~\mathit{trivial}} \qquad \infer {\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}} {\Psi \vdash A^+~\mathit{trivial}} \qquad \infer {\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}} {\Psi \vdash B^+~\mathit{trivial}} \] \[ \infer {\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}} {\Psi \vdash A^-~\mathit{absurd}} \qquad \infer {\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}} {\Psi \vdash B^-~\mathit{absurd}} \qquad \infer {\Psi \vdash A^- \vee^- B^-~\mathit{absurd}} {\Psi \vdash A^-~\mathit{absurd} &\Psi \vdash B^-~\mathit{absurd}} \]

Even yet, we are not done! We need to deal with the shifts, which embody another form of proof-by-contradiction: to prove that \(A^-\) holds trivially, assume it's false and derive a contradiction.

\[ \infer {\Psi \vdash {\downarrow}A^-~\mathit{trivial}} {\Psi, A^-~\mathit{false} \vdash \#} \qquad \infer {\Psi \vdash {\uparrow}A^+~\mathit{absurd}} {\Psi, A^+~\mathit{true} \vdash \#} \]

The thing that we've come up with by this process is what I've been calling a "weakly focused" version of classical logic. If we wanted to turn it into a "fully focused" presentation of classical logic, we'd only need to make one change: the first "proof by contradiction" rules, which we call "focusing" rules, would need to require that the context \(\Psi\) includes only judgments of the form \(P^+~\mathit{true}\), \({\downarrow}A^-~\mathit{true}\), \(P^-~\mathit{false}\), and \({\uparrow}A^+~\mathit{false}\). A context including only judgments of these four forms is called stable. To get full focusing, we would modify the "trivial focus" rule like this (a similar modification would be made to the "absurd focus" rule):

\[ \infer {\Psi \vdash \#} {{\uparrow}A^+~\mathit{false} \in \Psi &\Psi \vdash A^+~\mathit{trivial} &\Psi~\mathit{stable}} \]

Thinking about the sequent calculus as a bottom-up proof search procedure, if we are looking for a proof of a non-stable sequent, we can use our original, invertible rules to break down the connectives in the contexts until we have only stable sequents, at which point we can apply a focusing rule.

Until next time...

I haven't quite had time to do the thing I originally set out to do, which was to work through the notation in "Polarity and the logic of delimited continuations" better. But I will save that for another time. The motivation is the same as the one from before: it seems like we're almost certainly duplicating work. Is it possible to give the presentation of polarized classical logic from the previous section using about half as many rules?

Saturday, November 12, 2011

Another take on polarized natural deduction

This has been sitting on my office whiteboard for a few days, where it doesn't do anybody (well, except for me and my officemates) any good. It's a canonical-forms presentation of natural deduction for polarized logic that corresponds to the focused sequent calculus I presented and analyzed in the (recently-updated) Structural focalization draft (PDF warning). The polarized sequent calculus in that draft isn't new: it's a tweaked version of Liang and Miller's authoritative LJF.1 This canonical-forms presentation, however, is not something I've seen, so I'd be interested to know if it's been seen elsewhere: I know this is an area where a lot of other people have been working.

A bidirectional type system for polarized logic

There is, in my mind at least, no argument about what the propositions of polarized intuitionstic logic are; the following is straight of the aforementioned draft, but the basic idea dates back to Girard's 1991 post to the LINEAR mailing list, creatively titled "On the sex of angles".

\( {\bf Positive~propositions:} ~~ A^+, B^+, C^+ ~~ ::= ~~ p^+ \mid {\downarrow}A^- \mid \bot \mid A^+ \vee B^+ \mid \top^+ \mid A^+ \wedge^+ B^+ \)
\( {\bf Negative~propositions:} ~~ A^-, B^-, C^- ~~ ::= ~~ p^- \mid {\uparrow}A^+ \mid A^+ \supset B^- \mid \top^- \mid A^- \wedge^- B^- \)

What makes a proposition positive or negative? Good question! I won't address it here. (I address it a bit in the draft.)

Following tradition and best practices, we will structure the canonical forms presentation as a bidirectional type system. There are three judgments to worry about, as compared to the two judgments in other canonical forms presentations of logic. These judgments include contexts \(\Gamma\), which are sets of negative variables \(x\) associated with negative propositions (\(x{:}A^-\)) and positive variables \(z\) associated with atomic positive propositions (\(z{:}p^+\)).

  • \( \Gamma \vdash R \Rightarrow A^- \) - this is the familiar synthesis judgment from canonical forms presentations; it expresses that the atomic term \(R\) synthesizes \(A^-\). The word "synthesis" is used because it is possible to think of the type \(A^-\) as an output to the judgment, whereas \(\Gamma\) and \(R\) are inputs. In the other two judgments, everything will be treated as an input.

    \( R ::= x \mid R~V \mid \pi_1 R \mid \pi_2 R \)

  • \( \Gamma \vdash V \Leftarrow A^+ \) - this is the new judgment corresponding to right focus in the focused sequent calculus; we say that the value \(V\) checks against \(A^+\).

    \( V ::= z \mid N \mid {\sf inl}~V \mid {\sf inr}~V \mid \langle\rangle^+ \)

  • \( \Gamma \vdash N \Leftarrow [\Omega] A^-\) - this is a modification of the familiar checking judgment from canonical forms presentations, which usually lack the bit about \([\Omega]\), which is an ordered list of positive propositions. The reason we need \([\Omega]\) is precisely because we're dealing with positive propositions, which most canonical forms presentations lack or deal with in an unsatisfactory (in my humble opinion) manner. (I'll return to this point in the discussion at the end.)We say that thus judgment expresses that the normal term \(N\) decomposes \(\Omega\) and verifies \(A^-\).

    \( N ::= R \mid z.N \mid V \mid {\sf let}~R~{\sf in}~N \mid x.N \mid \lambda N \mid \langle\rangle^- \mid \langle N_1, N_2 \rangle^- \mid {\sf abort} \mid [ N_1, N_2 ]\)

Except for the first four rules, everything is patterned in the usual style of presentation for a natural deduction system: for each connective, we give first the introduction rules and then the elimination rules.

Hypothesis and atomic propositions

\[ \infer {\Gamma \vdash R \Leftarrow []p^-} {\Gamma \vdash R \Rightarrow p'^- & p^- = p'^-} \qquad \infer {\Gamma \vdash x \Rightarrow A^-} {x{:}A^- \in \Gamma} \qquad \infer {\Gamma \vdash z \Leftarrow p^+} {z{:}p^+ \in \Gamma} \qquad \infer {\Gamma \vdash z.N \Leftarrow [p^+, \Omega] C^-} {\Gamma, z{:}p&+ \vdash N \Leftarrow [\Omega] C^-} \]

Shifts

\[ \infer {{\uparrow}A^+~{\it stable}} {} \qquad \infer {p^-~{\it stable}} {} \] \[ \infer {\Gamma \vdash V \Leftarrow []{\uparrow}A^+} {\Gamma \vdash V \Leftarrow A^+} \qquad \infer {\Gamma \vdash {\sf let}~R~{\sf in}~N \Leftarrow []C^-} {\Gamma \vdash R \Rightarrow {\uparrow}A^+ & \Gamma \vdash N \Leftarrow [A^+]C^- & C^-~{\it stable}} \] \[ \infer {\Gamma \vdash N \Leftarrow {\downarrow}A^-} {\Gamma \vdash N \Leftarrow []A^-} \qquad \infer {\Gamma \vdash x.N \Leftarrow [{\downarrow}A^-, \Omega]C^-} {\Gamma, x{:}A^- \vdash N \Leftarrow [\Omega]C^-} \]

Connectives

\[ \infer {\Gamma \vdash \lambda N \Leftarrow []A^+ \supset B^-} {\Gamma \vdash N \Leftarrow [A^+]B^-} \qquad \infer {\Gamma \vdash R~V \Rightarrow B^-} {\Gamma \vdash R \Rightarrow A^+ \supset B^- & \Gamma \vdash V \Leftarrow A^+} \] \[ \infer {\Gamma \vdash \langle\rangle^- \Leftarrow \top^-} {} \qquad {\it (no~elim~rule~for~\top^-)} \] \[ \infer {\Gamma \vdash \langle N_1, N_2 \rangle^- \Leftarrow A^- \wedge^- B^-} {\Gamma \vdash N_1 \Leftarrow A^- & \Gamma \vdash N_2 \Leftarrow B^-} \qquad \infer {\Gamma \vdash \pi_1 R \Rightarrow A^-} {\Gamma \vdash R \Rightarrow A^- \wedge^- B^-} \qquad \infer {\Gamma \vdash \pi_2 R \Rightarrow B^-} {\Gamma \vdash R \Rightarrow A^- \wedge^- B^-} \] \[ {\it (no~intro~rule~for~\bot)} \qquad \infer {\Gamma \vdash {\sf abort} \Leftarrow [\bot, \Omega]C^-} {} \] \[ \infer {\Gamma \vdash {\sf inl}~V \Leftarrow A^+ \vee B^+} {\Gamma \vdash V \Leftarrow A^+} \qquad \infer {\Gamma \vdash {\sf inr}~V \Leftarrow A^+ \vee B^+} {\Gamma \vdash V \Leftarrow B^+} \qquad \infer {\Gamma \vdash [N_1, N_2] \Leftarrow [A^+ \vee B^+, \Omega] C^-} {\Gamma \vdash N_1 \Leftarrow [A^+, \Omega] C^- & \Gamma \vdash N_2 \Leftarrow [B^+, \Omega] C^-} \] \[ \infer {\Gamma \vdash \langle\rangle^+ \Leftarrow \top^+} {} \qquad \infer {\Gamma \vdash \langle\rangle.N \Leftarrow [\top^+, \Omega] C^-} {\Gamma \vdash N \Leftarrow [\Omega] C^-} \] \[ \infer {\Gamma \vdash \langle V_1^+, V_2^+ \rangle \Leftarrow A^+ \wedge^+ B^+} {\Gamma \vdash V_1^+ \Leftarrow A^+ & \Gamma \vdash V_2^+ \Leftarrow B^+} \qquad \infer {\Gamma \vdash N \Leftarrow [A^+ \wedge^+ B^+, \Omega] C^-} {\Gamma \vdash N \Leftarrow [A^+, B^+, \Omega] C^-} \]

Discussion

There are two possible questions I want to address about this system in the previous section.

What's with those positive "elimination" rules?

It would be possible to complain that the system above is not very "natural deduction-ey" after all - for all the positive connectives, I basically give sequent calculus left rules instead of natural deduction elimination rules. What happened to the usual "large elimination"-style elimination rules, for instance the usual disjunction-elimination rule whose proof term is a case analysis? \[ \infer {\Gamma \vdash ({\sf case}~R~{\sf of}~{\sf inl}~x \rightarrow N_1 \mid {\sf inr}~y \rightarrow N_2) \Leftarrow C} {\Gamma \vdash R \Rightarrow A \wedge B & \Gamma, x{:}A \vdash N_1 \Leftarrow C & \Gamma, y{:}B \vdash N_2 \Leftarrow C} \]

I think that the answer can be given by looking at the shifts. Essentially, every large elimination as we know and love it follows from the structure of the \({\uparrow}\) elimination rule, which all on its own looks an awful lot like a cut. You should verify for yourself that, if you let \({\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2\) be defined as syntactic sugar for \({\sf let}~R~{\sf in}~[ x.N_1, y.N_2]\), then the following rule is derivable whenever \(C^-~{\it stable}\) holds.2 \[ \infer {\Gamma \vdash {\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2 \Leftarrow C^-} {\Gamma \vdash R \Rightarrow {\uparrow}({\downarrow}A^- \vee {\downarrow}B^-) & \Gamma, x{:}A^- \vdash N_1 \Leftarrow C^- & \Gamma, y{:}B^- \vdash N_2 \Leftarrow C^-} \]

Pay attention to those two appearances of the downshift \({\downarrow}\) - they tell you something important about the structure of the usual elimination rules, which is that they "lose focus" while decomposing the disjunction. The usual way of thinking of normal natural deduction doesn't require, when you decompose \(A \vee B\) in an elimination, that you continue decomposing \(A\) and \(B\), which is represented here by the fact that, to match the structure of the usual elimination rule, you have to put downshifts \in explicitly. Jacob Howe, in his thesis and in his excellent paper "Proof search in lax logic," demonstrates this by making a focused sequent calculus that corresponds to the usual (constantly-interrupted) notion of decomposing positive propositions that you get if you follow your intuitions from natural deduction too closely.

By gathering all the large eliminations together in the \({\uparrow}\) elimination rule, we allow for the usual large eliminations to be defined, but also allow for the possibility that we might want to "chain" large eliminations in a well-defined way. (As an exercise, consider the structure of the elimination rule for \({\uparrow}(({\downarrow}A^- \wedge^+ {\downarrow}B^-) \vee (p^+ \wedge^+ \top^+))\).) This is why I claim that this is a natural deduction system that corresponds to the focused sequent calculus, instead of Howe's system where it's the other way around.3

Where are all the patterns?

Patterns have been associated with focused and/or canonical forms presentations of logic ever since... well, since Neel wrote the paper "Focusing on pattern matching"... or maybe since Noam wrote "Focusing and higher-order abstract syntax"... well, really at least since the CLF tech report. A lot of these, notably Noam's systems, have presented the rules of logic using pattern judgments, devices which abstractly represent the way in which values of a particular (positive) type are constructed or the way atomic terms of a particular (negative) type are eliminated.

There's this picture that isn't fully formed in my head, but that I've been thinking about for some time. On the left side of this picture, I think, you have the (pattern-free) presentation of natural deduction that I have given here at the top, and the (pattern-free) focused sequent calculus from "Structural focalization" at the bottom. Then, in the middle, you have (at the top) a natural deduction system that uses Noam's pattern judgments to introduce negative propositions and eliminate positive propositions - this is precisely (or at least very nearly) Taus Brock-Nannestad and Carsten Schürmann's system from "Focused Natural Deduction." Below it, there is a sequent calculus system that uses Noam's pattern judgments to eliminate negative propositions and introduce positive propositions. Kevin Watkins and Frank Pfenning came up with this idea and named it the "skeleton calculus" (a reference to the "spine calculus" of Cervesato and Pfenning), but it hasn't been written up that I know of. The skeleton calculus was what I was thinking about this morning when I decided to write this post. Then, on the far right, you have Noam's system, which is entirely pattern-based: patterns are used to both introduce and eliminate all connectives, so that the logic itself basically doesn't "know" about any connectives at all. This hazy picture is why, in the structural focalization draft, I mentioned that I thought Noam's system was a "natural synthesis of natural deduction and sequent calculus presentations".

But why should the picture look like the one I sketched above? Why not have a natural deduction system that uses patterns to introduce positives and eliminate negatives, or a sequent calculus that uses patterns to eliminate positives and introduce negatives? There's also the elephant in the room: CLF, which has both natural deduction and sequent calculus presentations, but which, in both instances, uses patterns only in the elimination of positive connectives. What are all these options doing here, and what are we to make of them? I don't know (yet).


1 I should add that, while the adaptation of LJF isn't particularly interesting, the proof term assignment I give is different than any others I've seen and I'm pretty happy with it; that's another case where I'd be interested to know if others have done anything similar.
2 This additional requirement of stability just reflects that it's always possible to restrict large eliminations in a canonical forms presentation of natural deduction to situations where the; this isn't always required in canonical forms presentations of natural deduction, but is an important part of making sure the sequent calculus presented in "Structural focalization" corresponds correctly to the natural deduction presentation.
3 I specifically suspect that this is a natural deduction system isomorphic to the focused sequent calculus from Structural focalization, but I don't want to make that claim until I've proved it.

Monday, October 10, 2011

Feeble Typing (a thought on Dart)


Update: A Word About The Title. If you came here from Twitter, the line I used - "not unsound, incomplete" - referred to the original title of the article "Incomplete by design," which was based on my misunderstanding of the dominant terminology in static analysis (see updates, and thanks David and Sam in the comments for pointing out my wrongness). When I realized this, I renamed the article "Deliberate decisions," but that wasn't an interesting title. When I read Rafaël Garcia-Suarez's take on Dart this morning, Why Dart is not the language of the future, I decided to support his proposal of calling types as a "lint-type development aid, not a language feature" feeble typing, and renamed the article again. You should read Rafaël's post and this one; it is possibly the most agreement you will ever find between a Perl blogger and a Carnegie Mellon University programming languages graduate student.

...anyway...


There's a lot of feedback, and a non-trivial amount of snark, going around the internet based on the release of Dart, a proposed Javascript-killer by Google. My primary experience with Javascript is that people tell me it's the worst compiler target language that is widely used as a compiler target language, so I basically have nothing invested in the language, but was interested by the discussions it brought up.

The snark about Dart has centered around the following line on Page 72 of the language spec, which I believe was pointed out to the Twitterverse by Debasish Ghosh.
The type system is unsound, due to the covariance of generic types. This is a deliberate choice (and undoubtedly controversial). Experience has shown that sound type rules for generics fly in the face of programmer intuition. It is easy for tools to provide a sound type analysis if they choose, which may be useful for tasks like refactoring.
But what does it mean for a type system to be unsound? I really think that the most illustrative snippet about Dart types came on the following page of the language spec was not the one that Debasish retweeted, but one that came on the next page:
A Dart implementation must provide a static checker that detects and reports exactly those situations this specification identifies as static warnings. However:
  • Running the static checker on a program P is not required for compiling and running P.
  • Running the static checker on a program P must not prevent successful compilation of P nor may it prevent the execution of P, regardless of whether any static warnings occur
This, for me, clarified what was going on substantially. Let me tell you a parable.

How to anger students in an undergraduate PL course


In four easy steps!

Step 1


Tell students to implement the following dynamic semantics of a programming language. Here's an example of a very simple language:

\[\infer {{\tt if}~e~{\tt then}~e_1~{\tt else}~e_2 \mapsto {\tt if}~e'~{\tt then}~e_1~{\tt else}~e_2 } {e \mapsto e'} \] \[\infer {{\tt if}~{\tt true}~{\tt then}~e_1~{\tt else}~e_2 \mapsto e_1} {} \qquad \infer {{\tt if}~{\tt false}~{\tt then}~e_1~{\tt else}~e_2 \mapsto e_2} {} \] \[ \infer {e_1 + e_2 \mapsto e_1' + e_2} {e_1 \mapsto e_1'} \qquad \infer {{\tt num}(n_1) + e_2 \mapsto {\tt num}(n_1) + e_2'} {e_2 \mapsto e_2'} \] \[ \infer {{\tt num}(n_1) + {\tt num}(n_2) \mapsto {\tt num}(n_1 + n_2)} {} \]

The implementation was to be an ML function step with type expr -> expr option, and the specification was that step e = SOME e' if there existed an e' such that e \(\mapsto\) e', and that step e = NONE otherwise (for instance, \(\tt true\) obviously can't take a step according to these rules).

Step 2


Describe how they can type-checking the language, by defining a type system like this. Have them implement this type checker as an ML function with type expr -> typ option, same idea.

\[ \infer {{\tt true} : {\tt bool}} {} \qquad \infer {{\tt false} : {\tt bool}} {} \qquad \infer {{\tt if}~e~{\tt then}~e_1~{\tt else}~e_2 : \tau} {e : {\tt bool} & e_1 : \tau & e_2 : \tau} \] \[ \infer {{\tt num}(n) : {\tt number}} {} \qquad \infer {e_1 + e_2 : {\tt number}} {e_1 : {\tt number} & e_2 : {\tt number}} \]

Step 3


Have students prove the theorem that this type system does something. The theorem statement goes as follows, and the proof is by the by-now standard technique of safety-via-progress-and-preservation.
Theorem (Safety): If \(e : \tau\) and \(e \mapsto \ldots \mapsto e'\), then \(e' : \tau\) and also there either exists some \(e''\) such that \(e' \mapsto e''\) or else \(e'\) is of the form \(\tt true\), \(\tt false\), or \({\tt num}(n)\).

Step 4


Test their ML code from Step 1 on expressions like if 4 then true else 9, breaking many of the students implementations of the dynamic semantics which were prepared only to handle well-typed inputs.

Analysis: is this fair?


Think about the perspective of the student who complained about the fact that their interpreter either crashed (or maybe returned SOME(Num 9)!) after being handed if 4 then true else 9. On one hand, they clearly violated the spirit of the assignment: Step 1 was a perfectly well-defined assignment all on its own, and they didn't fulfill the specification of that particular assignment. But on the other hand, they proved the theorem in Step 3, and perhaps feel as if they should get something out of the fact that they proved that theorem: the ability to only have to reason about the behavior of well-typed programs: why should it be surprising that garbage-in produced garbage-out?

Compiler writers actually get to think like that; indeed it's almost essential that they be allowed to. On a 32-bit machine, most values are compiled to plain ol' 32-bit words, and so the representation of \(\tt false\) might have the same in-memory representation as, say, \({\tt num}(0)\). Or it might have the same in-memory representation as \({\tt num}(1)\)! It doesn't matter, because, for the compiler writer, the safety theorem has already given a guarantee that the language has canonical forms - that if \(e : {\tt bool}\) and \(e\) eventually steps to an irreducible expression, then that irreducible expression must either be \(\tt true\) or \(\tt false\).

This means that the compiler writer need not worry about how if 1 then true else 9 and if 0 then true else 9 might behave - they may raise an exception, return (the memory representation of) true, or return (the memory representation of) false. The only programs upon which the compiler promises to behave the same way as the language definition are those that pass the typechecker, and the type safety theorem is a critical component of that promise.

In this way of looking at the world, the representation independence given by a type system is really quite important, and it means that an unsound type system could cause very very bad things to happen: if you're allowed to mess around with the representation of, say, a pointer, by adding things to it, then you have introduced buffer overflow errors to a language, which would be pretty awful if you allowed code written in this language to execute in a privileged way in your browser. That (hopefully!) is not at all what Dart means when they mean their programming language is unsound.

Theorems versus bugs


I think the problem here is that, as a broad overgeneralization, there are two ways to look at what people are doing with types in the first place. On one hand, there is the view that types are a tool to provably preclude certain classes of errors - like the possibility that you might end up with the expresion if 1 then true else 9 which is "stuck" according to the defined operational semantics. On the other hand, there is the idea that types are a language feature that is aimed at helping document code and reduce the number of bugs in a program without necessarily precluding any particular errors. In the academic circles I travel in, a type system is understood as a technical definition about the former: if you can't prove a theorem about what kind of errors your type system precludes, then it is something else besides a type system. I think it's quite fair to both type systems and static analysis researchers to call the latter notion of types a simple form of static analysis.

[Updated] There's nothing wrong, per se, with such a static analysis, though I think it's fair to call it an unsound static analysis instead of an unsound type system. To use the language of Ciera Jaspan's recent thesis, start with a particular class of error (gets-stuck, raises-a-certain-exception, divides by zero, whatever). An analysis is sound if it never passes a program with a particular sort of error (permitting a safety theorem about that analysis!) and complete if it fails only programs that will actually manifest the error at runtime.1 A sound but incomplete analysis is called conservative; the type checkers of ML and Java represent such analyses. An analysis that is neither sound nor complete is called pragmatic by Jaspan, as there aren't any theorems to be proved about such an analysis: they can be justified only by their utility in practice.

I can certainly think of situations where I'd want a pragmatic analysis. In the past, I have had occasion to write Python, which I will admit I have a certain fondness for. However, I have also complained about how my complicated Python program ran for half an hour and then failed with some cast exception that, upon inspection of the code, was always, statically, evidently going to happen no matter what the complicated bits of the code did and why couldn't it have warned me that it was going to do that before running for half an hour. Even if I implemented an analysis to generate such a warning, Python is and would remain a (hopefully) safe, strongly-typed programming language with exactly one type - the type of tagged data that causes an exception if you try to use an object as an integer (or whatever). The static analysis is trying to prove a different kind of theorem - one that says "you have indicated that raising ClassCastException is undesirable, and here's a proof that your current program will raise ClassCastException". If the static analysis can't prove that theorem (thus demonstrating a bug), I'm no worse off than I was when I used Python without that static analysis. A type safety theorem, however, would have the form "if the program passes the typechecker, then ClassCastException will not be raised."

And with my current understanding, the "unsound type system" of Dart is just such a "pragmatic analysis" as described by Jaspan. I hope my examples explain what might still be wrong with such a language - if you can't static preclude certain classes of errors, you must either allow "unspecified behavior" (and that way lies buffer overruns and security violations) or else you must be able and willing to check, at execution time, for the occurrence of those errors, which is not efficient (and for some classes of errors may be impossible). You're back in the world of our hypothetical angry student: you've got to be able to handle all the ill-formed programs and obey the language definition on a wider class of programs.

[Updated] ...you could argue, of course, that you're no worse off than you were when you finished Step 1. On some level, you're certainly right; my hypothetical Python+tool-to-catch-a-couple-of-errors is better, in my humble opinion, than Python without (and this view has the virtue of honesty). If you want to make that argument, however, I encourage you to read Chung-chieh Shan's related blog post about covariant generics, which argues from a less practical-compiler-optimizations and more philosophical point of view that I also find quite convincing. The point he makes is that the traditional view of types is important because types should mean things - unless you want to choke on pencils!

Conclusion


In summary: type systems are useful because of type safety theorems: a type safety theorem means that certain things can't happen. One reason this is nice is because the complier writer, the programmer, and the language designer needn't worry about what happens in the case that such an impossible thing happens.

Types are also a form of documentation, and they're a regular form of documentation that a complier can then take, generating warnings or errors about certain classes of bugs without actually promising to preclude those bugs. A static analysis that uses type information to preclude some (but not all) errors of a particular type is probably better termed an "pragmatic type-based analysis" than an "unsound type system." Garcia-Suarez called it "feeble typing" and I renamed my post accordingly. It's a bit more pejorative than the tone I was originally trying to take in the article, but I like it anyway.

I think that's a fair way of looking at things, and it puts decidable type systems, on a practical level, as a member of a larger class of type based-static analyses.2 In a type system, we must be able to ask a question about what sort of theorem is proved about programs that are well-typed, and if there's no such theorem, then the analysis is still within the class of type-based static analyses, but isn't so much a type system. At the end of the day, of course, English has no central authority, so asking people to distinguish "type-based analyses" from "type systems" may be a fool's errand,3 but I think it's worthwhile to delineate the difference, and I don't think my delineation significantly departs from current usage (apart from "feeble typing" which was, I think, coined the day after I originally posted this).

Something I started off wanting to talk about before this post got too long was why it is the case that "sound type rules for generics fly in the face of programmer intuition" (which is pretty clearly, in my opinion, missing the addendum "in languages with subtyping"), because two of the reasons why I think this is the case are quite interesting on their own. One of them has to do with polynomial data types and persistent data, and the other has to do with base types and refinement types as explored by William Lovas in his thesis. Neither of these ideas are adequately represented in existing programming languages, though they are more-or-less theoretically understood at this point. Another day, perhaps.

Last word


I noticed that one of the principal language designers was quoted as follows
You have to understand that the types are interface types, not implementation types – so the runtime cannot use the type information for anything. The types are simply thrown away during compilation.
That is to say, insofar as this article goes, I think I've only said factual things that the language designers would essentially agree with; in particular, they seem to recognize that their use of the word "types" seems bound to confuse (or troll) others. But calling a "feeble type" an interface types and a "type" an implementation type seems to just be making up words. And, as made-up-words go, I really dislike "interface types" as a neologism (certainly it has nothing to do with Java interface or anything that comes up when I search for "interface type"). The theorist's critique of Dart is precisely that things you call "type" should define inviolate interfaces and not mere suggestions that are neither enforced nor checked. Calling them interface types makes them sound like contracts, which are not thrown away by the compiler. "Suggestion type" might be a less pejorative version of "feeble type," perhaps? And "implementation types" is a terrible term to use to describe types in a a type system, types that (by way of a theorem about canonical forms) can be relied upon by both the programmer and the implementation.

1 Godefroid, Nori, Rajamani, and Tetal call the sound analyses may analyses (though it should perhaps be may not, as a sound analysis precludes a certain behavior) and call the complete analyses must analyses (the error must happen) in the paper "Compositional may-must analysis."
2 This view really isn't fair to the reason that we are interested in type systems and believe they're supposed to be useful, but that has to do with Curry-Howard and the unreasonable effectiveness of mathematics and with other things Shan talked about in his linked post.
3 Honestly, it probably won't help the fool's errand if I try to call the other group's type based analyses "feeble typing," but here I am, doing it anyway.