Request for Logic: 2011

Saturday, December 17, 2011

Notes on classical sequent calculi (1/2)

These are some notes I made to try to help me understand Noam's focused presentation of classical logic in Polarity and the Logic of Delimited Continuations. I hope these notes coud be useful to others.

Sequent presentations of classical logic

I write, inevitably, from the perspective of an intuitionistic proof theorist, so let's start there. In intuitionistic logics, sequents have the form \(\Gamma \vdash A~\mathit{true}\), where the hypothetical context \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\). The whole sequent \(\Gamma \vdash A~\mathit{true}\) is read as "assuming the truth of all the things in \(\Gamma\), we know \(A~\mathit{true}\)." Let's look at a couple of ways of presenting sequent calculi for classical logic.

Two-sided judgmental classical sequent calculi

One way of presenting a classical sequent calculus is to give a two-sided sequent, \(\Gamma \vdash \Delta\). As before, \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\), but \(\Delta\) has the form \(B_1~\mathit{false}, \ldots, A_m~\mathit{false}\), and the whole sequent is read as "taken together, if all the things judged true in \(\Gamma\) are true and all the things judged false in \(\Delta\) are false, then there is a contradiction."

The language of propositions is \(A ::= P \mid \neg A \mid A \wedge B \mid A \vee B\) - uninterpreted atomic propositions \(P\), negation, conjunction ("and"), and disjunction ("or"). I'm leaving out truth \(\top\) and falsehood \(\bot\) because they're boring. The rules for these two-sided classical sequent calculi look like this:

\[ \infer {\Gamma, ~ P~\mathit{true} \vdash \Delta, ~ P~\mathit{false}} {} \] \[ \infer {\Gamma \vdash \Delta, ~ \neg A~\mathit{false}} {\Gamma, ~ A~\mathit{true} \vdash \Delta} \qquad \infer {\Gamma, ~ \neg A~\mathit{true} \vdash \Delta} {\Gamma \vdash \Delta, ~ A~\mathit{false}} \] \[ \infer {\Gamma \vdash \Delta, ~ A \wedge B~\mathit{false}} {\Gamma \vdash \Delta, ~ A~\mathit{false} &\Gamma \vdash \Delta, ~ B~\mathit{false}} \qquad \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \Delta} \] \[ \infer {\Gamma \vdash \Delta, ~ A \vee B~\mathit{false}} {\Gamma \vdash \Delta, ~ A~\mathit{false}, ~ B~\mathit{false}} \qquad \infer {\Gamma, ~ A \vee B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true} \vdash \Delta &\Gamma, ~ B~\mathit{true} \vdash \Delta} \]

Two asides. First, in presentations that do not emphasize the fact that \(A_i~\mathit{false}\) and \(B_j~\mathit{true}\) are judgments and not propositions, there is another reading of the two-sided sequent \[A_1,\ldots,A_n \vdash B_1,\ldots,B_m\] This interpretation is that the truth of all of the \(A_i\) implies the truth of one of the \(B_j\) - this reading suggests a reading of any intuitionistic sequent proof as a classical sequent proof with one conclusion. You should convince yourself that this interpretation is equivalent to the interpretation above (hint: it's just a mode of use of De Morgan's laws).

Second aside: your rules may differ. I'm using a style of presentation where every connective is broken down by a unique connective and, from the perspective of bottom-up proof search, it's never a mistake to apply any rule, because the conclusion implies all of the premises (a property called invertibility). The "true" (or left) rule for conjunction (that is, "and" or \(\wedge\)) and the "false" (or right) rule for disjunction (that is, "or" or \(\vee\)) both have a different, non-invertible presentation. In the case of conjunction, it's this pair of rules: \[ \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true} \vdash \Delta} \qquad \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ B~\mathit{true} \vdash \Delta} \] You could "make a mistake" applying these rules in bottom-up proof search: just because there is a proof of \(\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta\) does not mean that there is a proof of \(\Gamma, ~ A~\mathit{true} \vdash \Delta\).

One-sided judgmental sequent sequent calculi

Of course, hypotheses are just hypotheses, there's no a priori reason why we need to separate the true ones and the false ones into separate contexts. Let's use a unified context and call it \(\Psi\). \[\Psi ::= \cdot \mid \Psi, A~\mathit{true} \mid \Psi, A~\mathit{false}\] Then, we can have the sequent form \(\Psi \vdash \#\), which we read as "all the assumptions in \(\Psi\) together imply a contradiction" - we pronounce \(\#\) as "contradiction." We'll need rewrite all of our rules:

\[ \infer {\Psi, ~ P~\mathit{true}, ~ P~\mathit{false} \vdash \#} {} \] \[ \infer {\Psi, ~ \neg A~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ \neg A~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{false} \vdash \#} \] \[ \infer {\Psi, ~ A \wedge B~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{false} \vdash \# &\Psi, ~ B~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A \wedge B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#} \] \[ \infer {\Psi, ~ A \vee B~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{false}, ~ B~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A \vee B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \# &\Psi, ~ B~\mathit{true} \vdash \#} \]

Hopefully you'll agree that this is "obviously the same" as the first presentation.

One-sided, truth-oriented sequent calculi

But wait! The "false" rule for conjunction looks just like the "true" rule for disjunction, and the "true" rule for conjunction looks just like the "false" rules for disjunction. Can we simplify these rules?

The usual answer is that you can, indeed, do with fewer rules and without a false judgment at all. However, we need two twists to deal with the rules that involved both the true and false judgments. First, we need to let every atomic proposition come in two flavors, the "regular" flavor \(P\) and the "negated" flavor \(\overline{P}\). Then, the rule dealing with atomic propositions looks like this: \[ \infer {\Gamma, ~ P~\mathit{true}, ~ \overline{P}~\mathit{true} \vdash \#} {} \] Second, instead of negation being a proposition \(\neg A\), we define a negation function, which I will write as \((A)^\bot\) to distinguish it from the propositional negation \(\neg A\). W. The negation function is defined as follows: \[ \begin{align} {(P)^\bot} = & \overline{P}\\ {(\overline{P})^\bot} = & P\\ {(A \wedge B)^\bot} = & {(A)^\bot} \vee {(B)^\bot}\\ {(A \vee B)^\bot} = & {(A)^\bot} \wedge {(B)^\bot}\\ \end{align} \] With this definition, we can eliminate the negation proposition altogether - the negation function just applies De Morgan laws all the way down to atomic propositions. We now get our sequent calculus for "half off" - there's no more official negation, and we don't need the false judgment at all anymore. We only need two more rules (for a total of three)!

\[ \infer {\Psi, ~ A \wedge B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ A \vee B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \# &\Psi, ~ B~\mathit{true} \vdash \#} \]

It would also be possible to play this game the other way around: gather everything on the right-hand side, bias the whole thing towards the "false" judgment, and basically get the "other half" of the two-sided sequent calculi. This ability to play the game equally well either way is part of what people mean when they say that classical logic is "very symmetric."

However, given that it's all the same, why not reason about truth and not falsehood? I've never understand why classical linear logic (in particular) always seems to bias itself towards one-sided sequent calculi on the right. There are important differences in what it means to think like a classical linear logician and what it means to think like an intuitionistic linear logician, but I really think that it unnecessarily exacerbates this divide when we have to turn all of our \(\oplus\)es to \(\&\)s and \(\otimes\)es to pars in order to talk to one another.

Polarized presentations of classical logic

Now for the real purpose of this note: writing out the review of classical logic that Noam gives in "Polarity and the Logic of Delimited Continuations." This discussion is a synthesis of that presentation and a little bit of "On the unity of duality."

Two for the price of two

Fundamentally, the observation Noam is making is that the one-sided truth-oriented sequent calculus goes too far - really, there are two kinds of disjunction, and two kinds of conjunction, which is why it seemed like the original calculus seemed to have redundancies. The third system above (the one-sided, truth-oriented sequent calculus) made it look like we were getting our logic for "half-off" - but really that's because the first two judgmental presentations were defining twice as many connectives as appeared to the naked eye. (As an aside, if you study classical linear logic, you're forced into the same conclusion for different reasons.)

Jason Reed taught me that, if you have two different judgments in a logic, it's worth seeing what happens if you syntactically differentiate the things you judge to be true and the things you judge to be false. Let's go ahead and "guess the right answer" - I'm going to call the things we judge to be true positive, and that the things we judge to be false negative. There's more than one of everything! \[ \begin{align} A^- = & \neg^- A^+ \mid P^- \mid A^- \wedge^- B^- \mid A^- \vee^- B^-\\ A^+ = & \neg^+ A^- \mid P^+ \mid A^+ \wedge^+ B^+ \mid A^+ \vee^+ B^+ \end{align} \] Here are a bunch of rules: note that the fact that the two negations change the polarity of the propositions; the rules make it evident that this is the right thing to do, because we have (for example) \(\neg^+ A^- ~\mathit{true}\) but \(A^- ~\mathit{false}\):

\[ \infer {\Psi, ~ \neg^- A^+~\mathit{false} \vdash \#} {\Psi, ~ A^+~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ \neg^+ A^-~\mathit{true} \vdash \#} {\Psi, ~ A^-~\mathit{false} \vdash \#} \] \[ \infer {\Psi, ~ A^- \wedge^- B^-~\mathit{false} \vdash \#} {\Psi, ~ A^-~\mathit{false} \vdash \# &\Psi, ~ B^-~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A^+ \wedge^+ B^+~\mathit{true} \vdash \#} {\Psi, ~ A^+~\mathit{true}, ~ B^+~\mathit{true} \vdash \#} \] \[ \infer {\Psi, ~ A^- \vee^- B^-~\mathit{false} \vdash \#} {\Psi, ~ A^-~\mathit{false}, ~ B^-~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A^+ \vee^+ B^+~\mathit{true} \vdash \#} {\Psi, ~ A^+~\mathit{true} \vdash \# &\Psi, ~ B^+~\mathit{true} \vdash \#} \]

So, are we good? Well, no, not really. The problem is that the "+" or "-" stuck to an atomic proposition isn't an annotation or modifier the way the overbar was in the one-sided, truth-oriented sequent calculus above. \(P^+\) and \(P^-\) are different atomic propositions, and it wouldn't be right to given an inference rule that had, as its conclusion, \(\Psi, ~ P^+~\mathit{true}, ~ P^-\mathit{false}\). Why? Well, for now let's go with "because I said so." The argument I have for this point isn't bulletproof, and it has to do with the role of atomic propositions as stand-ins for other propositions.

However, if you accept my argument from authority, we are left no way to prove, or even to state, anything equivalent to the classical \(P \vee \neg P\) into polarized logic, since any way we try to polarize this formula will lead to \(P\) needing to be both positive and negative. We're going to need some way, different from negation, of including positive propositions in negative ones.

These inclusions of positive propositions into negative ones (and vice versa) are called shifts - \({\downarrow}A^-\) is a positive proposition and \({\uparrow}A^+\) is a negative proposition. We could just add these two rules and call it a day... \[ \infer {\Psi, ~ P^+~\mathit{true}, ~ {\uparrow}P^+~\mathit{false} \vdash \#} {} \qquad \infer {\Psi, ~ P^-~\mathit{false}, ~ {\downarrow}P^-~\mathit{true} \vdash \#} {} \] ...but this is hardly general: the rules above should be derivable; this rule should be derivable as well: \[ \infer {\Psi, ~ P^+~\mathit{true}, ~{\uparrow}(P^+ \vee^+ Q^+)~\mathit{false} \vdash \#} {} \] All three of these derivable rules share a common property: in an on-paper proof, we would say that the contradiction is "trivial." The hypothesis \({\uparrow}P^+~\mathit{false}\) is trivial due to the fact that \(P^+\) is true by a different hypothesis, and because the truth of \(P^+\) allows us to trivially conclude that \(P^+ \vee^+ Q^+\) is true, \(P^+ \vee^+ Q^+\) is trivially contradictory as well.

This idea is encoded in two rules which capture proof-by-contradiction. One way we establish a contradiction is by showing that \(A^+\) is both false (by assumption) and trivial (by direct proof). The other way we establish a contradiction is by showing that \(A^-\) is both true (by assumption) and false (by direct proof of absurdity). These are embodied in the following two rules:

\[ \infer {\Psi \vdash \#} {{\uparrow}A^+~\mathit{false} \in \Psi &\Psi \vdash A^+~\mathit{trivial}} \qquad \infer {\Psi \vdash \#} {{\downarrow}A^-~\mathit{true} \in \Psi &\Psi \vdash A^-~\mathit{absurd}} \]

Now, of course, we need to give a bunch more rules to describe how to prove positive propositions trivial and negative propositions absurd.

\[ \infer {\Psi, ~ P^+~\mathit{true} \vdash P^+~\mathit{trivial}} {} \qquad \infer {\Psi, ~ P^-~\mathit{false} \vdash P^-~\mathit{absurd}} {} \] \[ \infer {\Psi \vdash \neg^+ A^-~\mathit{trivial}} {\Psi \vdash A^-~\mathit{absurd}} \qquad \infer {\Psi \vdash \neg^- A^+~\mathit{absurd}} {\Psi \vdash A^+~\mathit{trivial}} \] \[ \infer {\Psi \vdash A^+ \wedge^+ B^+~\mathit{trivial}} {\Psi \vdash A^+~\mathit{trivial} &\Psi \vdash B^+~\mathit{trivial}} \qquad \infer {\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}} {\Psi \vdash A^+~\mathit{trivial}} \qquad \infer {\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}} {\Psi \vdash B^+~\mathit{trivial}} \] \[ \infer {\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}} {\Psi \vdash A^-~\mathit{absurd}} \qquad \infer {\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}} {\Psi \vdash B^-~\mathit{absurd}} \qquad \infer {\Psi \vdash A^- \vee^- B^-~\mathit{absurd}} {\Psi \vdash A^-~\mathit{absurd} &\Psi \vdash B^-~\mathit{absurd}} \]

Even yet, we are not done! We need to deal with the shifts, which embody another form of proof-by-contradiction: to prove that \(A^-\) holds trivially, assume it's false and derive a contradiction.

\[ \infer {\Psi \vdash {\downarrow}A^-~\mathit{trivial}} {\Psi, A^-~\mathit{false} \vdash \#} \qquad \infer {\Psi \vdash {\uparrow}A^+~\mathit{absurd}} {\Psi, A^+~\mathit{true} \vdash \#} \]

The thing that we've come up with by this process is what I've been calling a "weakly focused" version of classical logic. If we wanted to turn it into a "fully focused" presentation of classical logic, we'd only need to make one change: the first "proof by contradiction" rules, which we call "focusing" rules, would need to require that the context \(\Psi\) includes only judgments of the form \(P^+~\mathit{true}\), \({\downarrow}A^-~\mathit{true}\), \(P^-~\mathit{false}\), and \({\uparrow}A^+~\mathit{false}\). A context including only judgments of these four forms is called stable. To get full focusing, we would modify the "trivial focus" rule like this (a similar modification would be made to the "absurd focus" rule):

\[ \infer {\Psi \vdash \#} {{\uparrow}A^+~\mathit{false} \in \Psi &\Psi \vdash A^+~\mathit{trivial} &\Psi~\mathit{stable}} \]

Thinking about the sequent calculus as a bottom-up proof search procedure, if we are looking for a proof of a non-stable sequent, we can use our original, invertible rules to break down the connectives in the contexts until we have only stable sequents, at which point we can apply a focusing rule.

Until next time...

I haven't quite had time to do the thing I originally set out to do, which was to work through the notation in "Polarity and the logic of delimited continuations" better. But I will save that for another time. The motivation is the same as the one from before: it seems like we're almost certainly duplicating work. Is it possible to give the presentation of polarized classical logic from the previous section using about half as many rules?

Saturday, November 12, 2011

Another take on polarized natural deduction

This has been sitting on my office whiteboard for a few days, where it doesn't do anybody (well, except for me and my officemates) any good. It's a canonical-forms presentation of natural deduction for polarized logic that corresponds to the focused sequent calculus I presented and analyzed in the (recently-updated) Structural focalization draft (PDF warning). The polarized sequent calculus in that draft isn't new: it's a tweaked version of Liang and Miller's authoritative LJF.¹ This canonical-forms presentation, however, is not something I've seen, so I'd be interested to know if it's been seen elsewhere: I know this is an area where a lot of other people have been working.

A bidirectional type system for polarized logic

There is, in my mind at least, no argument about what the propositions of polarized intuitionstic logic are; the following is straight of the aforementioned draft, but the basic idea dates back to Girard's 1991 post to the LINEAR mailing list, creatively titled "On the sex of angles".

\( {\bf Positive~propositions:} ~~ A^+, B^+, C^+ ~~ ::= ~~ p^+ \mid {\downarrow}A^- \mid \bot \mid A^+ \vee B^+ \mid \top^+ \mid A^+ \wedge^+ B^+ \)
\( {\bf Negative~propositions:} ~~ A^-, B^-, C^- ~~ ::= ~~ p^- \mid {\uparrow}A^+ \mid A^+ \supset B^- \mid \top^- \mid A^- \wedge^- B^- \)

What makes a proposition positive or negative? Good question! I won't address it here. (I address it a bit in the draft.)

Following tradition and best practices, we will structure the canonical forms presentation as a bidirectional type system. There are three judgments to worry about, as compared to the two judgments in other canonical forms presentations of logic. These judgments include contexts \(\Gamma\), which are sets of negative variables \(x\) associated with negative propositions (\(x{:}A^-\)) and positive variables \(z\) associated with atomic positive propositions (\(z{:}p^+\)).

\( \Gamma \vdash R \Rightarrow A^- \) - this is the familiar synthesis judgment from canonical forms presentations; it expresses that the atomic term \(R\) synthesizes \(A^-\). The word "synthesis" is used because it is possible to think of the type \(A^-\) as an output to the judgment, whereas \(\Gamma\) and \(R\) are inputs. In the other two judgments, everything will be treated as an input.

\( R ::= x \mid R~V \mid \pi_1 R \mid \pi_2 R \)
\( \Gamma \vdash V \Leftarrow A^+ \) - this is the new judgment corresponding to right focus in the focused sequent calculus; we say that the value \(V\) checks against \(A^+\).

\( V ::= z \mid N \mid {\sf inl}~V \mid {\sf inr}~V \mid \langle\rangle^+ \)
\( \Gamma \vdash N \Leftarrow [\Omega] A^-\) - this is a modification of the familiar checking judgment from canonical forms presentations, which usually lack the bit about \([\Omega]\), which is an ordered list of positive propositions. The reason we need \([\Omega]\) is precisely because we're dealing with positive propositions, which most canonical forms presentations lack or deal with in an unsatisfactory (in my humble opinion) manner. (I'll return to this point in the discussion at the end.)We say that thus judgment expresses that the normal term \(N\) decomposes \(\Omega\) and verifies \(A^-\).

\( N ::= R \mid z.N \mid V \mid {\sf let}~R~{\sf in}~N \mid x.N \mid \lambda N \mid \langle\rangle^- \mid \langle N_1, N_2 \rangle^- \mid {\sf abort} \mid [ N_1, N_2 ]\)

Except for the first four rules, everything is patterned in the usual style of presentation for a natural deduction system: for each connective, we give first the introduction rules and then the elimination rules.

Hypothesis and atomic propositions

\[ \infer {\Gamma \vdash R \Leftarrow []p^-} {\Gamma \vdash R \Rightarrow p'^- & p^- = p'^-} \qquad \infer {\Gamma \vdash x \Rightarrow A^-} {x{:}A^- \in \Gamma} \qquad \infer {\Gamma \vdash z \Leftarrow p^+} {z{:}p^+ \in \Gamma} \qquad \infer {\Gamma \vdash z.N \Leftarrow [p^+, \Omega] C^-} {\Gamma, z{:}p&+ \vdash N \Leftarrow [\Omega] C^-} \]

Shifts

\[ \infer {{\uparrow}A^+~{\it stable}} {} \qquad \infer {p^-~{\it stable}} {} \] \[ \infer {\Gamma \vdash V \Leftarrow []{\uparrow}A^+} {\Gamma \vdash V \Leftarrow A^+} \qquad \infer {\Gamma \vdash {\sf let}~R~{\sf in}~N \Leftarrow []C^-} {\Gamma \vdash R \Rightarrow {\uparrow}A^+ & \Gamma \vdash N \Leftarrow [A^+]C^- & C^-~{\it stable}} \] \[ \infer {\Gamma \vdash N \Leftarrow {\downarrow}A^-} {\Gamma \vdash N \Leftarrow []A^-} \qquad \infer {\Gamma \vdash x.N \Leftarrow [{\downarrow}A^-, \Omega]C^-} {\Gamma, x{:}A^- \vdash N \Leftarrow [\Omega]C^-} \]

Connectives

\[ \infer {\Gamma \vdash \lambda N \Leftarrow []A^+ \supset B^-} {\Gamma \vdash N \Leftarrow [A^+]B^-} \qquad \infer {\Gamma \vdash R~V \Rightarrow B^-} {\Gamma \vdash R \Rightarrow A^+ \supset B^- & \Gamma \vdash V \Leftarrow A^+} \] \[ \infer {\Gamma \vdash \langle\rangle^- \Leftarrow \top^-} {} \qquad {\it (no~elim~rule~for~\top^-)} \] \[ \infer {\Gamma \vdash \langle N_1, N_2 \rangle^- \Leftarrow A^- \wedge^- B^-} {\Gamma \vdash N_1 \Leftarrow A^- & \Gamma \vdash N_2 \Leftarrow B^-} \qquad \infer {\Gamma \vdash \pi_1 R \Rightarrow A^-} {\Gamma \vdash R \Rightarrow A^- \wedge^- B^-} \qquad \infer {\Gamma \vdash \pi_2 R \Rightarrow B^-} {\Gamma \vdash R \Rightarrow A^- \wedge^- B^-} \] \[ {\it (no~intro~rule~for~\bot)} \qquad \infer {\Gamma \vdash {\sf abort} \Leftarrow [\bot, \Omega]C^-} {} \] \[ \infer {\Gamma \vdash {\sf inl}~V \Leftarrow A^+ \vee B^+} {\Gamma \vdash V \Leftarrow A^+} \qquad \infer {\Gamma \vdash {\sf inr}~V \Leftarrow A^+ \vee B^+} {\Gamma \vdash V \Leftarrow B^+} \qquad \infer {\Gamma \vdash [N_1, N_2] \Leftarrow [A^+ \vee B^+, \Omega] C^-} {\Gamma \vdash N_1 \Leftarrow [A^+, \Omega] C^- & \Gamma \vdash N_2 \Leftarrow [B^+, \Omega] C^-} \] \[ \infer {\Gamma \vdash \langle\rangle^+ \Leftarrow \top^+} {} \qquad \infer {\Gamma \vdash \langle\rangle.N \Leftarrow [\top^+, \Omega] C^-} {\Gamma \vdash N \Leftarrow [\Omega] C^-} \] \[ \infer {\Gamma \vdash \langle V_1^+, V_2^+ \rangle \Leftarrow A^+ \wedge^+ B^+} {\Gamma \vdash V_1^+ \Leftarrow A^+ & \Gamma \vdash V_2^+ \Leftarrow B^+} \qquad \infer {\Gamma \vdash N \Leftarrow [A^+ \wedge^+ B^+, \Omega] C^-} {\Gamma \vdash N \Leftarrow [A^+, B^+, \Omega] C^-} \]

Discussion

There are two possible questions I want to address about this system in the previous section.

What's with those positive "elimination" rules?

It would be possible to complain that the system above is not very "natural deduction-ey" after all - for all the positive connectives, I basically give sequent calculus left rules instead of natural deduction elimination rules. What happened to the usual "large elimination"-style elimination rules, for instance the usual disjunction-elimination rule whose proof term is a case analysis? \[ \infer {\Gamma \vdash ({\sf case}~R~{\sf of}~{\sf inl}~x \rightarrow N_1 \mid {\sf inr}~y \rightarrow N_2) \Leftarrow C} {\Gamma \vdash R \Rightarrow A \wedge B & \Gamma, x{:}A \vdash N_1 \Leftarrow C & \Gamma, y{:}B \vdash N_2 \Leftarrow C} \]

I think that the answer can be given by looking at the shifts. Essentially, every large elimination as we know and love it follows from the structure of the \({\uparrow}\) elimination rule, which all on its own looks an awful lot like a cut. You should verify for yourself that, if you let \({\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2\) be defined as syntactic sugar for \({\sf let}~R~{\sf in}~[ x.N_1, y.N_2]\), then the following rule is derivable whenever \(C^-~{\it stable}\) holds.² \[ \infer {\Gamma \vdash {\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2 \Leftarrow C^-} {\Gamma \vdash R \Rightarrow {\uparrow}({\downarrow}A^- \vee {\downarrow}B^-) & \Gamma, x{:}A^- \vdash N_1 \Leftarrow C^- & \Gamma, y{:}B^- \vdash N_2 \Leftarrow C^-} \]

Pay attention to those two appearances of the downshift \({\downarrow}\) - they tell you something important about the structure of the usual elimination rules, which is that they "lose focus" while decomposing the disjunction. The usual way of thinking of normal natural deduction doesn't require, when you decompose \(A \vee B\) in an elimination, that you continue decomposing \(A\) and \(B\), which is represented here by the fact that, to match the structure of the usual elimination rule, you have to put downshifts \in explicitly. Jacob Howe, in his thesis and in his excellent paper "Proof search in lax logic," demonstrates this by making a focused sequent calculus that corresponds to the usual (constantly-interrupted) notion of decomposing positive propositions that you get if you follow your intuitions from natural deduction too closely.

By gathering all the large eliminations together in the \({\uparrow}\) elimination rule, we allow for the usual large eliminations to be defined, but also allow for the possibility that we might want to "chain" large eliminations in a well-defined way. (As an exercise, consider the structure of the elimination rule for \({\uparrow}(({\downarrow}A^- \wedge^+ {\downarrow}B^-) \vee (p^+ \wedge^+ \top^+))\).) This is why I claim that this is a natural deduction system that corresponds to the focused sequent calculus, instead of Howe's system where it's the other way around.³

Where are all the patterns?

Patterns have been associated with focused and/or canonical forms presentations of logic ever since... well, since Neel wrote the paper "Focusing on pattern matching"... or maybe since Noam wrote "Focusing and higher-order abstract syntax"... well, really at least since the CLF tech report. A lot of these, notably Noam's systems, have presented the rules of logic using pattern judgments, devices which abstractly represent the way in which values of a particular (positive) type are constructed or the way atomic terms of a particular (negative) type are eliminated.

There's this picture that isn't fully formed in my head, but that I've been thinking about for some time. On the left side of this picture, I think, you have the (pattern-free) presentation of natural deduction that I have given here at the top, and the (pattern-free) focused sequent calculus from "Structural focalization" at the bottom. Then, in the middle, you have (at the top) a natural deduction system that uses Noam's pattern judgments to introduce negative propositions and eliminate positive propositions - this is precisely (or at least very nearly) Taus Brock-Nannestad and Carsten Schürmann's system from "Focused Natural Deduction." Below it, there is a sequent calculus system that uses Noam's pattern judgments to eliminate negative propositions and introduce positive propositions. Kevin Watkins and Frank Pfenning came up with this idea and named it the "skeleton calculus" (a reference to the "spine calculus" of Cervesato and Pfenning), but it hasn't been written up that I know of. The skeleton calculus was what I was thinking about this morning when I decided to write this post. Then, on the far right, you have Noam's system, which is entirely pattern-based: patterns are used to both introduce and eliminate all connectives, so that the logic itself basically doesn't "know" about any connectives at all. This hazy picture is why, in the structural focalization draft, I mentioned that I thought Noam's system was a "natural synthesis of natural deduction and sequent calculus presentations".

But why should the picture look like the one I sketched above? Why not have a natural deduction system that uses patterns to introduce positives and eliminate negatives, or a sequent calculus that uses patterns to eliminate positives and introduce negatives? There's also the elephant in the room: CLF, which has both natural deduction and sequent calculus presentations, but which, in both instances, uses patterns only in the elimination of positive connectives. What are all these options doing here, and what are we to make of them? I don't know (yet).

¹ I should add that, while the adaptation of LJF isn't particularly interesting, the proof term assignment I give is different than any others I've seen and I'm pretty happy with it; that's another case where I'd be interested to know if others have done anything similar.
² This additional requirement of stability just reflects that it's always possible to restrict large eliminations in a canonical forms presentation of natural deduction to situations where the; this isn't always required in canonical forms presentations of natural deduction, but is an important part of making sure the sequent calculus presented in "Structural focalization" corresponds correctly to the natural deduction presentation.
³ I specifically suspect that this is a natural deduction system isomorphic to the focused sequent calculus from Structural focalization, but I don't want to make that claim until I've proved it.

Monday, October 10, 2011

Feeble Typing (a thought on Dart)

Update: A Word About The Title. If you came here from Twitter, the line I used - "not unsound, incomplete" - referred to the original title of the article "Incomplete by design," which was based on my misunderstanding of the dominant terminology in static analysis (see updates, and thanks David and Sam in the comments for pointing out my wrongness). When I realized this, I renamed the article "Deliberate decisions," but that wasn't an interesting title. When I read Rafaël Garcia-Suarez's take on Dart this morning, Why Dart is not the language of the future, I decided to support his proposal of calling types as a "lint-type development aid, not a language feature" feeble typing, and renamed the article again. You should read Rafaël's post and this one; it is possibly the most agreement you will ever find between a Perl blogger and a Carnegie Mellon University programming languages graduate student.

...anyway...

There's a lot of feedback, and a non-trivial amount of snark, going around the internet based on the release of Dart, a proposed Javascript-killer by Google. My primary experience with Javascript is that people tell me it's the worst compiler target language that is widely used as a compiler target language, so I basically have nothing invested in the language, but was interested by the discussions it brought up.

The snark about Dart has centered around the following line on Page 72 of the language spec, which I believe was pointed out to the Twitterverse by Debasish Ghosh.

The type system is unsound, due to the covariance of generic types. This is a deliberate choice (and undoubtedly controversial). Experience has shown that sound type rules for generics fly in the face of programmer intuition. It is easy for tools to provide a sound type analysis if they choose, which may be useful for tasks like refactoring.

But what does it mean for a type system to be unsound? I really think that the most illustrative snippet about Dart types came on the following page of the language spec was not the one that Debasish retweeted, but one that came on the next page:

A Dart implementation must provide a static checker that detects and reports exactly those situations this specification identifies as static warnings. However:

Running the static checker on a program P is not required for compiling and running P.

Running the static checker on a program P must not prevent successful compilation of P nor may it prevent the execution of P, regardless of whether any static warnings occur

This, for me, clarified what was going on substantially. Let me tell you a parable.

How to anger students in an undergraduate PL course

In four easy steps!

Step 1

Tell students to implement the following dynamic semantics of a programming language. Here's an example of a very simple language:

\[\infer {{\tt if}~e~{\tt then}~e_1~{\tt else}~e_2 \mapsto {\tt if}~e'~{\tt then}~e_1~{\tt else}~e_2 } {e \mapsto e'} \] \[\infer {{\tt if}~{\tt true}~{\tt then}~e_1~{\tt else}~e_2 \mapsto e_1} {} \qquad \infer {{\tt if}~{\tt false}~{\tt then}~e_1~{\tt else}~e_2 \mapsto e_2} {} \] \[ \infer {e_1 + e_2 \mapsto e_1' + e_2} {e_1 \mapsto e_1'} \qquad \infer {{\tt num}(n_1) + e_2 \mapsto {\tt num}(n_1) + e_2'} {e_2 \mapsto e_2'} \] \[ \infer {{\tt num}(n_1) + {\tt num}(n_2) \mapsto {\tt num}(n_1 + n_2)} {} \]

The implementation was to be an ML function step with type expr -> expr option, and the specification was that step e = SOME e' if there existed an e' such that e \(\mapsto\) e', and that step e = NONE otherwise (for instance, \(\tt true\) obviously can't take a step according to these rules).

Step 2

Describe how they can type-checking the language, by defining a type system like this. Have them implement this type checker as an ML function with type expr -> typ option, same idea.

\[ \infer {{\tt true} : {\tt bool}} {} \qquad \infer {{\tt false} : {\tt bool}} {} \qquad \infer {{\tt if}~e~{\tt then}~e_1~{\tt else}~e_2 : \tau} {e : {\tt bool} & e_1 : \tau & e_2 : \tau} \] \[ \infer {{\tt num}(n) : {\tt number}} {} \qquad \infer {e_1 + e_2 : {\tt number}} {e_1 : {\tt number} & e_2 : {\tt number}} \]

Step 3

Have students prove the theorem that this type system does something. The theorem statement goes as follows, and the proof is by the by-now standard technique of safety-via-progress-and-preservation.
Theorem (Safety): If \(e : \tau\) and \(e \mapsto \ldots \mapsto e'\), then \(e' : \tau\) and also there either exists some \(e''\) such that \(e' \mapsto e''\) or else \(e'\) is of the form \(\tt true\), \(\tt false\), or \({\tt num}(n)\).

Step 4

Test their ML code from Step 1 on expressions like if 4 then true else 9, breaking many of the students implementations of the dynamic semantics which were prepared only to handle well-typed inputs.

Analysis: is this fair?

Think about the perspective of the student who complained about the fact that their interpreter either crashed (or maybe returned SOME(Num 9)!) after being handed if 4 then true else 9. On one hand, they clearly violated the spirit of the assignment: Step 1 was a perfectly well-defined assignment all on its own, and they didn't fulfill the specification of that particular assignment. But on the other hand, they proved the theorem in Step 3, and perhaps feel as if they should get something out of the fact that they proved that theorem: the ability to only have to reason about the behavior of well-typed programs: why should it be surprising that garbage-in produced garbage-out?

Compiler writers actually get to think like that; indeed it's almost essential that they be allowed to. On a 32-bit machine, most values are compiled to plain ol' 32-bit words, and so the representation of \(\tt false\) might have the same in-memory representation as, say, \({\tt num}(0)\). Or it might have the same in-memory representation as \({\tt num}(1)\)! It doesn't matter, because, for the compiler writer, the safety theorem has already given a guarantee that the language has canonical forms - that if \(e : {\tt bool}\) and \(e\) eventually steps to an irreducible expression, then that irreducible expression must either be \(\tt true\) or \(\tt false\).

This means that the compiler writer need not worry about how if 1 then true else 9 and if 0 then true else 9 might behave - they may raise an exception, return (the memory representation of) true, or return (the memory representation of) false. The only programs upon which the compiler promises to behave the same way as the language definition are those that pass the typechecker, and the type safety theorem is a critical component of that promise.

In this way of looking at the world, the representation independence given by a type system is really quite important, and it means that an unsound type system could cause very very bad things to happen: if you're allowed to mess around with the representation of, say, a pointer, by adding things to it, then you have introduced buffer overflow errors to a language, which would be pretty awful if you allowed code written in this language to execute in a privileged way in your browser. That (hopefully!) is not at all what Dart means when they mean their programming language is unsound.

Theorems versus bugs

I think the problem here is that, as a broad overgeneralization, there are two ways to look at what people are doing with types in the first place. On one hand, there is the view that types are a tool to provably preclude certain classes of errors - like the possibility that you might end up with the expresion if 1 then true else 9 which is "stuck" according to the defined operational semantics. On the other hand, there is the idea that types are a language feature that is aimed at helping document code and reduce the number of bugs in a program without necessarily precluding any particular errors. In the academic circles I travel in, a type system is understood as a technical definition about the former: if you can't prove a theorem about what kind of errors your type system precludes, then it is something else besides a type system. I think it's quite fair to both type systems and static analysis researchers to call the latter notion of types a simple form of static analysis.

[Updated] There's nothing wrong, per se, with such a static analysis, though I think it's fair to call it an unsound static analysis instead of an unsound type system. To use the language of Ciera Jaspan's recent thesis, start with a particular class of error (gets-stuck, raises-a-certain-exception, divides by zero, whatever). An analysis is sound if it never passes a program with a particular sort of error (permitting a safety theorem about that analysis!) and complete if it fails only programs that will actually manifest the error at runtime.¹ A sound but incomplete analysis is called conservative; the type checkers of ML and Java represent such analyses. An analysis that is neither sound nor complete is called pragmatic by Jaspan, as there aren't any theorems to be proved about such an analysis: they can be justified only by their utility in practice.

I can certainly think of situations where I'd want a pragmatic analysis. In the past, I have had occasion to write Python, which I will admit I have a certain fondness for. However, I have also complained about how my complicated Python program ran for half an hour and then failed with some cast exception that, upon inspection of the code, was always, statically, evidently going to happen no matter what the complicated bits of the code did and why couldn't it have warned me that it was going to do that before running for half an hour. Even if I implemented an analysis to generate such a warning, Python is and would remain a (hopefully) safe, strongly-typed programming language with exactly one type - the type of tagged data that causes an exception if you try to use an object as an integer (or whatever). The static analysis is trying to prove a different kind of theorem - one that says "you have indicated that raising ClassCastException is undesirable, and here's a proof that your current program will raise ClassCastException". If the static analysis can't prove that theorem (thus demonstrating a bug), I'm no worse off than I was when I used Python without that static analysis. A type safety theorem, however, would have the form "if the program passes the typechecker, then ClassCastException will not be raised."

And with my current understanding, the "unsound type system" of Dart is just such a "pragmatic analysis" as described by Jaspan. I hope my examples explain what might still be wrong with such a language - if you can't static preclude certain classes of errors, you must either allow "unspecified behavior" (and that way lies buffer overruns and security violations) or else you must be able and willing to check, at execution time, for the occurrence of those errors, which is not efficient (and for some classes of errors may be impossible). You're back in the world of our hypothetical angry student: you've got to be able to handle all the ill-formed programs and obey the language definition on a wider class of programs.

[Updated] ...you could argue, of course, that you're no worse off than you were when you finished Step 1. On some level, you're certainly right; my hypothetical Python+tool-to-catch-a-couple-of-errors is better, in my humble opinion, than Python without (and this view has the virtue of honesty). If you want to make that argument, however, I encourage you to read Chung-chieh Shan's related blog post about covariant generics, which argues from a less practical-compiler-optimizations and more philosophical point of view that I also find quite convincing. The point he makes is that the traditional view of types is important because types should mean things - unless you want to choke on pencils!

Conclusion

In summary: type systems are useful because of type safety theorems: a type safety theorem means that certain things can't happen. One reason this is nice is because the complier writer, the programmer, and the language designer needn't worry about what happens in the case that such an impossible thing happens.

Types are also a form of documentation, and they're a regular form of documentation that a complier can then take, generating warnings or errors about certain classes of bugs without actually promising to preclude those bugs. A static analysis that uses type information to preclude some (but not all) errors of a particular type is probably better termed an "pragmatic type-based analysis" than an "unsound type system." Garcia-Suarez called it "feeble typing" and I renamed my post accordingly. It's a bit more pejorative than the tone I was originally trying to take in the article, but I like it anyway.

I think that's a fair way of looking at things, and it puts decidable type systems, on a practical level, as a member of a larger class of type based-static analyses.² In a type system, we must be able to ask a question about what sort of theorem is proved about programs that are well-typed, and if there's no such theorem, then the analysis is still within the class of type-based static analyses, but isn't so much a type system. At the end of the day, of course, English has no central authority, so asking people to distinguish "type-based analyses" from "type systems" may be a fool's errand,³ but I think it's worthwhile to delineate the difference, and I don't think my delineation significantly departs from current usage (apart from "feeble typing" which was, I think, coined the day after I originally posted this).

Something I started off wanting to talk about before this post got too long was why it is the case that "sound type rules for generics fly in the face of programmer intuition" (which is pretty clearly, in my opinion, missing the addendum "in languages with subtyping"), because two of the reasons why I think this is the case are quite interesting on their own. One of them has to do with polynomial data types and persistent data, and the other has to do with base types and refinement types as explored by William Lovas in his thesis. Neither of these ideas are adequately represented in existing programming languages, though they are more-or-less theoretically understood at this point. Another day, perhaps.

Last word

I noticed that one of the principal language designers was quoted as follows

You have to understand that the types are interface types, not implementation types – so the runtime cannot use the type information for anything. The types are simply thrown away during compilation.

That is to say, insofar as this article goes, I think I've only said factual things that the language designers would essentially agree with; in particular, they seem to recognize that their use of the word "types" seems bound to confuse (or troll) others. But calling a "feeble type" an interface types and a "type" an implementation type seems to just be making up words. And, as made-up-words go, I really dislike "interface types" as a neologism (certainly it has nothing to do with Java interface or anything that comes up when I search for "interface type"). The theorist's critique of Dart is precisely that things you call "type" should define inviolate interfaces and not mere suggestions that are neither enforced nor checked. Calling them interface types makes them sound like contracts, which are not thrown away by the compiler. "Suggestion type" might be a less pejorative version of "feeble type," perhaps? And "implementation types" is a terrible term to use to describe types in a a type system, types that (by way of a theorem about canonical forms) can be relied upon by both the programmer and the implementation.

¹ Godefroid, Nori, Rajamani, and Tetal call the sound analyses may analyses (though it should perhaps be may not, as a sound analysis precludes a certain behavior) and call the complete analyses must analyses (the error must happen) in the paper "Compositional may-must analysis."
² This view really isn't fair to the reason that we are interested in type systems and believe they're supposed to be useful, but that has to do with Curry-Howard and the unreasonable effectiveness of mathematics and with other things Shan talked about in his linked post.
³ Honestly, it probably won't help the fool's errand if I try to call the other group's type based analyses "feeble typing," but here I am, doing it anyway.

Monday, October 3, 2011

Slicing and dicing validity: existing ideas

This is a story of modal logic and the judgmental methadology, a style of presenting and justifying logic that Pfenning and Davies [1] adapted from Martin Löf's Siena Lectures [2] and which was, in turn, adapted by Bernado Toninho and I [3]. My goal is to re-present some existing work: I changed the title from "old ideas" to "existing ideas" to remove any implicit negative connotation - my next step is to extend these existing ideas a bit, not to replace these old ideas with better ones.

The topic of this story is, at least at first, an account of modal logic that I have adapted from Pfenning and Davies's presentation [1]. I assume some familiarity with the background, this isn't a comprehensive introduction. Pfenning and Davies's topic was a presentation of modal logic as a logic where there are two categorial judgments, \(A~{\it true}\) and \(A~{\it valid}\); the intent is that validity captures unconditional truth, which is eventually related to modal necessity.

We collect the categorical judgments of the form \(A~{\it true}\) into true contexts which are written as \(\Gamma\) or \(\Psi\), and we collect the categorical judgments of the form \(A~{\it valid}\) into valid contexts which are written as \(\Delta\). It is by use of contexts that categorical judgments give rise to hypothetical judgments, and Pfenning-Davies modal logic has two hypothetical judgments: \(\seq{\Delta}{A~{\it valid}}\) and \(\seq{\Delta; \Gamma}{A~{\it true}}\)

In a system with hypothetical judgments, meaning is given by describing three kinds of principles: the weakening principles (expressing how the context works), the identity principles (expressing how assumptions are used), and the substitution principles (expressing how assumptions are discharged). We rely on these principles as we explain a logic, and we are required to tie the knot eventually, demonstrating that our logic's particular rules satisfy its defining principles. Here are the defining principles for Pfenning-Davies modal logic:

Weakening principles

If \(\Delta \subseteq \Delta'\) and \(\seq{\Delta}{A~{\it valid}}\), then \(\seq{\Delta'}{A~{\it valid}}\)
If \(\Delta \subseteq \Delta'\), \(\Gamma \subseteq \Gamma'\), and \(\seq{\Delta; \Gamma}{A~{\it true}}\), then \(\seq{\Delta'; \Gamma'}{A~{\it true}}\)

Identity/hypothesis principles

\(\seq{\Delta, A~{\it valid}}{A~{\it valid}}\)
\(\seq{\Delta; \Gamma, A~{\it true}}{A~{\it true}}\)

Substitution principles

If \(\seq{\Delta}{A~{\it valid}}\) and \(\seq{\Delta, A~{\it valid}}{C~{\it valid}}\), then \(\seq{\Delta}{C~{\it valid}}\).
If \(\seq{\Delta}{A~{\it valid}}\) and \(\seq{\Delta, A~{\it valid}; \Gamma}{C~{\it true}}\), then \(\seq{\Delta; \Gamma}{C~{\it true}}\).
If \(\seq{\Delta; \Gamma}{A~{\it true}}\) and \(\seq{\Delta; \Gamma, A~{\it true}}{C~{\it true}}\), then \(\seq{\Delta; \Gamma}{C~{\it true}}\).

Two judgmental rules define the fundamental relationship between the vaildity and truth judgments; I'll call the first of these rules valid introduction and the second valid elimination; they capture the notion that validity is defined relative to truth, and that validity is truth in all contexts.

The fundamental relationship between truth and validity is set up by two rules, which express that validity is unconditional truth. This means that \(A~{\it valid}\) should be entailed by \(A~{\it true}\) in the absence of any conditions, and that proving \(A~{\it valid}\) should entail \(A~{\it true}\) in any conditions, where "conditions" are represented by the context \(\Gamma\) of true hypotheses.The first rule is the introduction rule for validity: it shows how we verify the judgment \(A~{\it valid}\). The second rule is the elimination rule for validity: it shows that, given a proof of \(A~{\it valid}\), we can use it to show the truth of \(A\) in any context \(\Gamma\).

\[ \infer {\seq{\Delta}{A~{\it valid}}} {\seq{\Delta; \cdot}{A~{\it true}}} \qquad \infer {\seq{\Delta; \Gamma}{A~{\it true}}} {\seq{\Delta}{A~{\it valid}}} \]

The validity judgment is made interesting by its use in defining modal necessity \(\Box A\). Notice how similar the introduction rule for modal necessity is to the elimination rule for the validity judgment! The elimination rule, however, acts like a let- or case-expression; it is the kind of elimination rule known as a large elimination, which are associated with the positive connectives (if you're a fan of polarized logic).

\[ \infer {\seq{\Delta; \Gamma}{\Box A~{\it true}}} {\seq{\Delta}{A~{\it valid}}} \qquad \infer {\seq{\Delta; \Gamma}{C~{\it true}}} {\seq{\Delta; \Gamma}{\Box A~{\it true}} & \seq{\Delta, A~{\it valid}; \Gamma}{C~{\it true}}} \]

A truth-oriented justification of logic

This story above is (I believe) broadly compatible with the story from Pfenning and Davies' original paper; certainly what I called the introduction and elimination rules for validity are just the result of writing down the "Definition of Validity" at the beginning of Section 4 of their paper. However, Pfenning and Davies banish the second half of the definition of validity, what I called the elimination rule, from their actual formal system. In its place, they give a valid hypothesis rule, variously written as \(uhyp\) or \(hyp^*\).

\[ \infer {\seq{\Delta, A~{\it valid}; \Gamma}{A~{\it true}}} {} \]

With this change, the introduction rule for validity now only appears in the conclusion of the validity introduction rule and the premise of the \(\Box\) introduction rule, and the two can be collapsed together. The introduction rule for validity is what is called invertible - the conclusion implies the premise - so we don't fundamentally change the \(\Box\) introduction rule if we replace the premise \(\seq{\Delta}{A~{\it valid}}\) with the equivalent premise \(\seq{\Delta; \cdot}{A~{\it true}}\).

\[ \infer {\seq{\Delta; \Gamma}{\Box A~{\it true}}} {\seq{\Delta; \cdot}{A~{\it true}}} \]

These changes to the logical system, write Pfenning and Davies, are sound with respect to the initial definitions I gave (the \(uhyp\) rule, in particular, is derivable by a combination of the valid hypothesis principle and the elimination rule for validity). However, the resulting system is incomplete with respect to the categorical judgment \(A~{\it valid}\). This necessitates that we give up on the one hypothesis principle and one substitution principle that deal exclusively with validity, and also edit the substitution principle with a premise \(\seq{\Delta}{A~{\it valid}}\) to have the premise \(\seq{\Delta; \cdot}{A~{\it true}}\) instead.

These changes flow naturally out of the intentional goal of explaining the meaning of the logical connectives entirely through the lens of the categorical judgment \(A~{\it true}\). This is a perfectly fine way of explaining S4 modal logic, but I don't see it as a fundamental part of the character of judgmental presentations of logic. For instance, there are plenty of situations, (Hybrid LF, Simpson's IK, Constructive Provability Logic) where there is no single categorical judgement for truth, but rather a whole family of indexed judgments, and it's possible to prove things at each of the judgments. This suggests a different story, a different presentation of the same system above. This presentation is one that I've adapted from Jason Reed [4].

Another story

If one is perhaps less interested in the correspondence to modal logic, it's possible to take the judgmental setup that we started with and tease apart the notion of modal necessity just a bit. We do this by introducing two connectives, \(\unicode{x25F8}A\) and \(\unicode{x25FF}A\), with the intentional visual pun that \(\unicode{x25F8}\!\!\!\!\unicode{x25FF} A = \Box A\). The introduction and elimination rules for validity are now properly understood as introduction and elimination rules for \(\unicode{x25FF}\), whereas the introduction and elimination rules for \(\Box\) are now understood as introduction and elimination rules for \(\unicode{x25F8}\).

\[ \infer {\seq{\Delta}{\unicode{x25FF}A~{\it valid}}} {\seq{\Delta; \cdot}{A~{\it true}}} \qquad \infer {\seq{\Delta; \Gamma}{A~{\it true}}} {\seq{\Delta}{\unicode{x25FF}A~{\it valid}}} \] \[ \infer {\seq{\Delta; \Gamma}{\unicode{x25F8}A~{\it true}}} {\seq{\Delta}{A~{\it valid}}} \qquad \infer {\seq{\Delta; \Gamma}{C~{\it true}}} {\seq{\Delta; \Gamma}{\unicode{x25F8}A~{\it true}} & \seq{\Delta, A~{\it valid}; \Gamma}{C~{\it true}}} \]

Now, if these rules are really the only ones that deal with validity — if all the "regular" connectives like implication are defined the traditional way based only on truth...

\[ \infer {\seq{\Delta; \Gamma}{A \supset B~{\it true}}} {\seq{\Delta; \Gamma, A~{\it true}}{B~{\it true}}} \qquad \infer {\seq{\Delta; \Gamma}{B~{\it true}}} {\seq{\Delta; \Gamma}{A \supset B~{\it true}} & \seq{\Delta; \Gamma}{A~{\it true}}} \]

...then it is possible to observe that this way in which we have teased apart box into two parts actually syntactically differentiates the propositions judged by the two categorical judgments: if we're starting from truth, we'll only introduce propositions that live under a \(\unicode{x25F8}\), and we can only meaningfully use valid propositions that have \(\unicode{x25FF}\) as their outermost connective. This observation, which I picked up from by Jason Reed, makes it possible to talk about true propositions - those propositions judged as true \((A ::= \unicode{x25F8}P \mid A \supset B \mid a )\) separately from valid propositions - those propositions judged valid \((P ::= \unicode{x25FF}A)\) [4]. The rules that we have set up essentially enforce this syntactic separation already, however; there's no particular requirement that we enforce it on our own.

In fact, one thing we can do is put another version of implication "up at validity," like this:

\[ \infer {\seq{\Delta}{A \supset B~{\it valid}}} {\seq{\Delta, A~{\it valid}}{B~{\it valid}}} \qquad \infer {\seq{\Delta}{B~{\it valid}}} {\seq{\Delta}{A \supset B~{\it valid}} & \seq{\Delta}{A~{\it valid}}} \]

It's not terribly obvious why you'd want to do this, but as Neel points out to me in the comments, this was first done in the context of linear logic by Wadler and Benton, where it turns out to be quite useful for programming [5,6]. I was previously unaware of that work (doh). Reed used the idea of putting implication up at validity in order to describe both the necessity modality and the lax-logic circle (the "monad") as fragments of the same logic, which is where I got this notion from. If you ignore the "valid" implication, you get modal logic with \(\Box A = \unicode{x25F8}\!\!\!\!\unicode{x25FF} A\), but if you ignore the "true" implication and instead reason at "validity" (at which point it's definitely not quite right to call the old "validity" validity anymore), you get lax logic with \(\bigcirc A = \unicode{x25FF} \unicode{x25F8} A\). That observation is essentially the same one made by the two translations in Figure 6 of Wander and Benton's paper, though again Wadler and Benton's "truth" judgment was linear, the one I've presented here was persistent, and Reed considered both variants [6].

To reiterate, this isn't a very useful story if one is interested in giving a fundamental account of modal logic, but I believe that "teasing apart" the modal operator, and even opening up the possibility of inserting other propositions in the space of "valid propositions" that such a teasing apart naturally create, raises interesting possibilities. Hopefully in the next post I'll say about about that in the context of contextual modal type theory (CMTT).

[1] Frank Pfenning and Rowan Davies, "A Judgmental Reconstruction of Modal Logic," 1999 (published 2001)
[2] Per Martin-Löf, "On the Meanings of the Logical Constants and the Justifications of the Logical Laws" a.k.a. "The Siena Lectures," 1983 (published 1996)
[3] Robert J. Simmons and Bernardo Toninho, "Principles of Constructive Provability Logic," CMU Tech Report 2010.
[4] Jason Reed, "A Judgmental Deconstruction of Modal Logic," unpublished 2009.
[5] Nick Benton, "A Mixed Linear and Non-Linear Logic: Proofs, Terms and Models," CSL 1995.
[6] Nick Benton and Phil Wadler, "Linear Logic, Monads and the Lambda Calculus," LICS 1996.

Tuesday, September 27, 2011

My New Focalization Technique is Unstoppable

While it took, as they say, a bit of doing, I have a completed draft of a paper on my website that I believe provides a really elegant solution to what has been a very, very annoying problem for some time: writing down a proof called the completeness of focusing. Don't worry, the paper explains what that means: it has to, because one of the things the paper argues is that most existing statements of the completeness of focusing aren't general enough! In the process, this writeup gave me a good (enough) excuse to talk in a very general way about a lot of fundamental phenomena around the idea of polarization in logic that we've been noticing in recent years.

Anyway, the draft is here - Structural Focalization. The accompanying Twelf development is, of course, on the Twelf wiki: Focusing. I've thrown the paper and the Twelf code up on arXiv while I figure out while I figure out what to do with it; comments would be greatly appreciated!

I don't want to have a post that's just "hey here's a paper," so here's an addition, a follow-up to the rather popular "Totally Nameless Representation" post from awhile back. I still prove weakening in Agda by exactly the technique presented there, which commenter thecod called the "presheaf method." But for natural deduction systems, I almost never use the term metric approach that was the core of what I was presenting in "totally nameless representation," since it works just fine to use the approach where term M of type A (the one you're substituting for x) is in context Γ and the term N (the one with x free) is in context Γ' ++ A :: Γ - the free variable x is allowed to be in the middle of the context, in other words; you don't have to use exchange on the second term, so you're always calling induction on something Agda recognizes as structurally smaller.

This worked for natural deduction systems, but I didn't think it would work for sequent calculi, since you needed to toy around with both contexts and induct on both terms in presentation of cut for a sequent calculus. However, for a focused sequent calculus like what I present in the paper, you still can do without the totally nameless metric! If you set things up right, the rightist substitutions (where you work on decomposing the right term, the one with the variable free) allow you to extend the context only of the first term, and the leftist substitutions (where you work on decomposing the left term, the one you're substituting for the variable) allow you to work on the second term, and the two are only connected by the principal substitutions (which reduce the type, which more or less lets you get away with anything as far as the induction principle is concerned).

A code example of this, which could use some more commentary, can be found on GitHub: Polar.agda.

Thursday, September 8, 2011

Nondeterministic thoughts on nondeterminism

So if you've been hanging around lately, I've been writing posts where I think I'm talking about new ideas. (I'm not always correct.) This post, on the other hand, is definitely more a review-and-synthesis sort of post; mostly stuff I gleaned over the summer from reading up on Dijkstra's GCL, Ball et al.'s Automatic predicate abstraction of C programs, and a number of K. Rustan M. Leino's papers as synthesized for me by my advisors Aditya Nori and Sriram Rajamani at Microsoft Research India.

The first section represents my somewhat simplistic thoughts on other's people's work in the semantics of imperative programming languages, mostly thoughts I had this summer at MSR. I hope they are merely naive and simplistic and not actively wrong, and that you (gentle reader) will have patience to correct me where I am actively wrong. Then I have three short, mostly unrelated discussions that I needed to clear out of my head. The first discussion reviews a neat way of understanding loop invariants, due to Leino. The second talks about the algebraic properties of non-deterministic programs. The third discussion tries to relate nondeterminism to the work I'm doing on representing transition systems in linear logic, though it's mostly just speculative and may not make sense to anyone but me in its current half-baked form.

C programs and their abstractions

Our starting point will be, essentially, a generic while language that we'll write with C-like syntax. Here's a program:

  /* Example 1 */
  1. if(x > 0) {
  2.   x += 5;
  3. } else {
  4.   x = -x - 2;
  5. }
  6. return x;

The thing about a (fully-defined) C program is that it's deterministic - for any initial assignments to variables, there's a unique behavior. So, for the above program, if the initial value of x is, say, 4, then the program will execute line 1 (with x=4), then line 2 (with x=4), then line 6 (with x=9), and the program will then return 9. If the initial value of x is -12, then the program will execute line 1 (with x=-12), then line 4 (with x=-12), then line 6 (with x=10), and the program will then return 10. I've now implicitly told you what what counts as a "behavior" - it's a stream of line numbers and states that may or may not end with a returned value. So the meaning of a program is a function from initial states to streams of line numbers and states.

We can even think of the line numbers as a convenient fiction: we could translate the program above into this form:

   linenum = 1;
   checkpoint();
   if(x > 0) {
     linenum = 2;
     checkpoint();
     x += 5;
   } else {
     linenum = 4;
     checkpoint();
     x = -x - 2;
   }
   linenum = 6;
   checkpoint();
   return x;

Then we say that the behavior of a program is a function from initial states to the stream of intermediate states as reported by the special checkpoint() function; the "line number" part of the stream is just handled by the value associated with linenum in the memory state.

So that's the meaning (the denotation) that I choose for deterministic C programs: they're functions from initial states to streams of states (where the state includes the line number). From here on out I'll just think of any given deterministic C program as a way of specifying one such function. There are also many such functions that can't be written down appropriately with the syntax of a C program; that's not my concern here.

Nondeterminism

Instead, I want to talk about non-deterministic C programs. The meaning of a nondeterministic C program is a function from initial states to sets of streams of states,¹ and the syntax we'll use to write down nondeterministic C programs is an extension of the syntax of deterministic C programs. This means, of course, that there's a trivial inclusion of deterministic C programs into nondeterministic C programs.

The easiest way to come up with nondeterministic C programs (which represent sets of functions from initial states to streams) is to turn if branches into nondeterministic branches. The standard way that the C model checking folks represent nondeterministic choice is to write if(*). Here's an example:

  /* Example 2 */
  1. if(*) {
  2.   x += 5;
  3. } else {
  4.   x = -x - 2;
  5. }
  6. return x;

The *, near as I can tell, was chosen to resemble the "wildcard" symbol, since any boolean expression we put in that if statement (such as x > 0 to recover our Example 1) results in a program that refines Example 2. (Terminology: a program refines another if, for every initial state, every stream in the meaning of the more-refined program also belongs to the meaning of the less-refined program.)

Assume/impossible

Nondeterministic choice allows us to enlarge the meaning of a nondeterministic program. Inserting assume() statements will cut down the set of streams in the meaning of a nondeterministic program. In particular, we have to exclude any streams that would, for any initial state, violate an assumption. We therefore have to be careful that we don't cut down the set of streams so far that there aren't any left: there are no deterministic programs that refine assume(false) - every initial state maps to the empty set of streams. For that matter, there also aren't any deterministic programs that refine Example 3:

  /* Example 3 */
  1. assume(x > 0);
  2. return x;

For every initial state where the value of x is positive, the meaning of Example 3 is a set containing only the stream that goes to line one, then line two, then returns the initial value of x. For every initial state where the value of x is not positive, the meaning of the program has to be the empty set: any stream would immediately start by violating an assumption.

Assumptions were used in Ball et al.'s "Automatic predicate abstraction of C programs" paper, which explains part of the theory behind the SLAM software verifier. In that work, they got rid of all of the if statements as a first step, replacing them with nondeterministic choices immediately followed by assumptions.²

  /* Example 4 */
  1. if(*) {
       assume(x > 0);
  2.   x += 5;
  3. } else {
       assume(x <= 0);
  4.   x = -x - 2;
  5. }
  6. return x;

The program in Example 4 is basically a degenerate nondeterministic program: its meaning is exactly equivalent to the deterministic Example 1. On the other hand, if we remove the statement assume(x <= 0) after line 3 in Example 4, we have a nondeterministic program that is refined by many deterministic programs. For instance, the deterministic program in Example 1 refines Example 4 without the assume(x <= 0), but so do Examples 5 and 6:

  /* Example 5 */
  1. x = x;
  4. x = -x - 2;
  6. return x;

  /* Example 6 */
  1. if(x > 100) {
  2.   x += 5;
  3. } else { 
  4.   x = -x - 2;
  5. }
  6. return x;

Example 4 as it was presented shows how assume() together with nondeterministic choice can encode normal if statements. We could also define an statement impossible and say that a stream just cannot ever reach an impossible statement. Impossibility can be defined in terms of assume - impossible is equivalent to assume(false). Alternatively, we can use impossibility to define assumptions - assume(e) is equivalent to if(e) { impossible; }. So there's a bit of a chicken-and-egg issue: I'm not totally sure whether we should build in impossible/if combination and use it to define assume() or whether we should build in assume() and use it to define impossible and if statements. It probably doesn't matter.

Assert/abort

In "Automatic predicate abstraction of C programs," assert() statements are understood to be the things that are supposed to be avoided. However, in Leino's work, they have a meaning of absolute and unbounded nondeterminism, which is the interpretation I want to use. If the expression e in an assert(e) statement evaluates to false, anything can happen - it's as if we could jump to an arbitrary point in memory and start executing code; absolutely any deterministic program that refines a nondeterministic program up to the point where the nondeterministic program fails an assertion will definitely refine that nondeterministic program.

So assert() represents unbounded nondeterminism: and in the sense of "jump to any code," not just in the sense of "replace me with any code" - the program assert(false); while(true) {} is refined by every program, including ones that terminate. This interpretation is easy to connect to the SLAM interpretation where we say "assertion failures are to be avoided," since obviously one of the things you might prove about your C code is that it doesn't jump to arbitrary code and start executing it.

Analogy: assert() is to abort as assume() is to impossible - we can define assert(e); as if(e) { abort; }.

Abstracting loops

The three primitives we have discussed so far are almost enough to let us perform a fun trick that my advisors at MSR attribute to Leino. First, though, we need one more primitive, a "baby" form of assert/abort called havoc(x), which allows the value associated with the variable x to be changed in any way. In other words, a program with the statement havoc(x) is refined by the program where the statement havoc(x) is replaced by the statement x = 4, the statement x = x - 12, the statement x = y - 16, or even the statement if(z) { x = y; } else { x = w; }.

Given the havoc primitive, imagine we have a program with a loop, and no checkpoints inside the loop:

  1. /* Before the loop */
     while(e) {
       ... loop body, 
        which only assigns 
        to variables x1,...,xn ...
     }
  2. /* After the loop */

Say we know the following two things:

The loop will always terminate if the expression e_inv evaluates to true at line 1, and
From any state where e_inv and e both evaluate to true, after the loop body is run, e_inv will evaluate to true.

Then, we know the program above refines the following program:

  1. /* Before the loop */
     assert(e_inv);
     havoc(x1); ... havoc(xn);
     assume(e_inv && !e);
  2. /* After the loop */

This is a somewhat unusual way of looking at loop invariants: we can take a loop and abstract it with nondeterministic straight line code. If we get to the beginning of the loop and our loop invariant is violated, all hell breaks loose, but if the loop invariant holds at the beginning, then when we exit the loop we know the following things: variables not assigned to by the loop haven't changed, the loop invariant holds, and the negation of the loop guard holds.

I like this particular game: it's a non-traditional way of looking at the analysis of loops by asking "is this program abstraction sound."

Algebraic structure of non-determinism

Notice that assume(false) is the unit of non-deterministic choice: writing the nondeterministic program if(*) assume(false); else Stmt is, in all program contexts, the same as writing just Stmt. Furthermore, nondeterministic choice is commutative (if(*) Stmt1; else Stmt2; is equivalent to if(*) Stmt2; else Stmt1;) and associative (it doesn't matter how I nest nondeterministic choices if I want to make a three-way nondeterministic choice). This means that nondeterministic choice and the impossible statement (which is equivalent to assume(false)) form a monoid - it's like the wildcard * could also be interpreted as multiplication, and we could write if(*) Stmt2; else Stmt1; as Stmt1 × Stmt2 and write impossible as 1.

Furthermore, if(*) assert(false); else Stmt is a nondeterministic program that is refined by every program, since assert(false) refines every program and we can just use the "wildcard" reading of nondeterminism to replace * with true. Algebraically, this means that the abort statement (which is equivalent to assert(false)) annihilates nondeterminism - we could write impossible as 0, and then we have 0 × Stmt = Stmt × 0 = 0.

Is abort the unit of a binary operation Stmt1 + Stmt2 in the same way that the number 0 is the unit of addition? It's not clear that it's useful for the abstraction of C programs, but I think if we go looking for a binary operation that abort is the unit of, what we'll find is perhaps best called both.

Both-and statements

The both statement is a weird statement that allows us to complete the picture about the algebraic structure of abort/assert(false), impossible/assume(false) and if(*). A deterministic program only refines the program both Stmt1 and Stmt2 if it refines both Stmt1 and Stmt2. As an example the program below is exactly equivalent to Example 1 - the first statement forces all streams starting from initial states where x is not positive to hit line 4 and not line 2 (lest they violate the assume(x > 0), and the second statement forces the initial states where x is positive to hit line 2 and not line 4 (lest they violate the assume(x <= 0).

  /* Example 7 */
     both {
  1.   if(*) {
         assume(x > 0);
  2.     x += 5;
  3.   } else {
  4.     x = -x - 2;
  5.   }
     } and {
  1.   if(*) {
  2.     x += 5;
  3.   } else {
         assume(x <= 0);
  4.     x = -x - 2;
  5.   }
     }
  6. return x;

The both statement is a binary operation whose unit is abort, forming a monad: doing anything at all - as long as it's some specific thing - is the same as doing that specific thing. The both is also annihilated by impossible, because doing nothing as long as it's one specific thing is the same thing as doing nothing. (This is all very fuzzy, but can be made formal in terms of set union and intersection operations.) That's interesting, because both statements aren't really like addition at all: we have two monoids whose the units both annihilate the binary operator that they aren't the unit of. If distributivity works in both directions (I'm not sure it does...) then we have a Boolean algebra without negation (what's that called?).

Two linear logic interpretations

One of the things that I like to do is to take the state of an evolving machine, encode it as a linear logic context Δ, and then say that the linear logic derivability judgment Δ ⊢ A proves something about the state. I think there are two ways of doing this for the language semantics I wrote out above. One is familiar, both are interesting.

The existential interpretation

The way I'm used to thinking about these things is that Δ ⊢ A shows that A is a possible (that is, existential) property of the system encoded into the linear logic context Δ. In that case, we want to encode nondeterministic as the additive conjunction A & B, as we can execute a nondeterministic choice by taking either choice. We want to encode impossible as ⊤, which prevents us from proving anything interesting about a series of nondeterministic choices that lead us to an impossible point in the program. We'll furthermore want to encode abort as 0, since once we reach an abort we can prove any existential property that we want to about the program!

The universal interpretation

Another way of understanding derivability would be to say that Δ ⊢ A shows that A is a necessary property of the system encoded into the linear logic context Δ. In that case, we would want to encode nondeterministic choice as the additive disjunction A ⊕ B, since to prove that something necessarily holds of a nondeterministically branching program, we have to show that it holds regardless of how the nondeterministic choice is resolved. This ensures that we will have to consider all possible resolutions of nondeterministic choices, but reaching an impossible state means that no programs can get to the current state, so all universal properties hold vacuously. We therefore would want to encode impossible as 0. On the other hand, reaching an abort means we know nothing about the universal properties of the program - the program can now do anything - so we encode abort as ⊤.

¹ Note that functions from initial states to sets of streams is *different* than sets of functions from initial states to streams! Both might be valid ways of looking at the meaning of nondeterministic programs depending on how you look at it, in fact, I originally wrote this post think in terms of the other definition.
² Certainly Ball et al. didn't come up with this idea - the idea is implicit in the Guarded Command Language; I'm just referring to Ball et al. because I'm sticking with their syntax.

Tuesday, September 6, 2011

Clowns to the left of me, jokers to the right, here I am, stuck in a hole abstraction

Happy day-after-labor-day weekend! This also stems from a discussion I had with Aaron Turon, who, a while back with Claudio Russo, also thought about using hole abstractions for maps and folds over data (with some fascinating twists, but I'll leave Aaron and Claudio's twists for Aaron and Claudio).

One of the things Aaron noticed was that it's nice to define a tail-recursive map over trees this way, and indeed that is true. I transliterated some code Aaron wrote out into Levy#: you can see it at turon.levy. The example really starts rubbing against two of the unnecessary deficiencies of Levy# - a lack of polymorphism (the trees have to be integer trees) and a lack of mutual recursion. We really want to define the map in terms of two mutually (tail-)recursive functions moveUp and moveDown, both of which take a holey tree -o tree and a tree that belongs "in the hole," the difference being that in moveUp I've already applied the map function to the subtree in the hole, and in moveDown I still need to apply the map function to the subtree in the hole.

This only works for monomorphic maps - if we had polymorphism, our hole abstraction zippers would allow us to write maps from 'a tree to 'a tree using a map function with type 'a -> F 'a. It would not allow us to write the maps we're really used to from 'a tree to 'b tree using a map function with type 'a -> F 'b.

In order to capture more general maps - or fold/reduce operations - the non-hole-abstraction approach is to use the generalization of the derivative that Conor McBride describes in clowns to the left of me, jokers to the right. We observe that a map function like the one in turon.levy sweeps left-to-right through the data structure, always segmenting the data structure into some already-processed "clowns to the left," some not-yet-processed "jokers to the right," and a path through the middle. Then, we either have a sub-tree of clowns, which means we'll now look up for more jokers, or a sub-tree of jokers, which means we need to look down into the jokers to process them. The overall picture is this one:

A nice thing about the hole abstraction framework is that we can represent and dynamically manipulate precisely this picture, whereas the derivative/zipper approach generally forces one to think about the derivative-like structure from the "inside out." If the jokers are a tree of ints and the clowns are a tree of bools, we can describe them like this:

   data ILeaf: int -o itree
      | IBranch: itree -o itree -o itree

   data BLeaf: bool -o btree
      | BBranch: btree -o btree -o btree

Then, the data structure that we will use as a representation of the intermediate state of computation as illustrated above is either stuck in the middle with clowns to the left or stuck in the middle with jokers to the right.

   data Clowns: btree -o cj -o cj # Clowns to the left of me
      | Jokers: cj -o itree -o cj # Jokers to the right

Just like our difference lists from before, there are no closed members of the type cj; we can only form values of type cj -o cj. The map implementation can be found at clownjoker.levy.

If we did have polymorphism, the clown/joker data structure for a map would look like this:

   data Clowns: 'clown tree 
             -o ('clown, 'joker) cj 
             -o ('clown, 'joker) cj
      | Jokers: ('clown, 'joker) cj 
             -o 'joker tree 
             -o ('clown, 'joker) cj

The more general clown/joker data structure for a fold would look like this:

   data Clowns: 'clowns 
             -o ('clowns, 'joker) cj 
             -o ('clowns, 'joker) cj
      | Jokers: ('clowns, 'joker) cj 
             -o 'joker tree 
             -o ('clowns, 'joker) cj

This is, it must be said, all reasonably unsatisfying. We have seen that hole abstractions can completely replace the derivative data structure - we could well and truly "scrap our zippers." To use the clown/joker generalization of derivatives, we have to define an obvious, boiler-plate-ey datatype cj, which it is merely mechanical to compute for any datatype of jokers. Can we do better?

Wednesday, August 24, 2011

Holey Data, Postscript: Hole Abstraction

I love the PL community on Twitter. Yesterday, David Van Horn presented this observation:

Most papers in computer science describe how their author learned what someone else already knew. -- Peter LandinTue Aug 23 13:04:05 via Buffer Favorite Retweet Replydvanhorn@λ-calcul.us
lambda_calculus

I'm gonna be perfectly honest: I was specifically thinking of the holey data series when I responded.

@lambda_calculus Presumably the implication is that this is bad for the advancement of science. OTOH, it's a pretty viable model for a blog.Tue Aug 23 13:33:53 via web Favorite Retweet ReplyRob Simmons
simrob

'Cause really, I was kind of surprised that I hadn't found any previous presentations of the idea (and was therefore encouraged when Conor indicated that he at least sort of had thought of it too). Chung-chieh Shan's response was great:

@simrob @lambda_calculus I'm not sure that Landin intended to imply that it's bad!Tue Aug 23 22:29:48 via web Favorite Retweet ReplyChung-chieh Shan
ccshan

I guess I hope he's right (or at least that I am), because in the comments to Part 3, Aaron Turon correctly observes that I missed a whopper: Minamide's 1998 POPL Paper "A Functional Representation of Data Structures with a Hole." Horray for blogs, I guess: if this has been a paper review, I would have been quite embarrassed, but as a blog comment I was delighted to find out about this related work.

A Functional Representation of Data Structures with a Hole

Minamide's paper effectively covers the same ground I covered in Part 1 of the Holey Data series: his linear representational lambdas are called hole abstractions. It's a very well written paper from the heady days when you could talk about the proposed Standard ML Basis Library and it was still common for authors to cite Wright and Felleisen when explaining that they were proving language safety by (what amounts to) proving progress and preservation lemmas.

My favorite part of reading the paper was that it simultaneously confirmed two suspicions that I picked up after a recent discussion with Dan Licata:

Levy#/call-by-push-value was an unnecessarily restrictive calculus for programming with linear representational functions - ML would work just fine.
By using call-by-push-value, I'd avoided certain red herrings that would have plagued the development otherwise - somewhere on almost every page I thought "yep, that explanation is less clear than it could be because they don't know about difference between value (positive) types and computation (negative) types yet."

One neat thing this paper describes that I hadn't thought about is automatically transforming non-tail-recursive programs that are naturally structurally inductive to tail-recursive programs based on hole abstractions/difference lists/linear representational functions. It seems like this optimization in particular is where many of the paper's claimed performance gains are found. It also seems like zippers-as-derivatives are pretty clearly lurking around the discussion in Section 4, which is neat.

Overall, this paper made me quite happy - it suggests there is something canonical about this idea, and proves that the idea leads to concrete performance gains. It also made me sad, of course, because it seems like the ideas therein didn't really catch on last time around. But that's part of why I didn't stop with Part 1 in the series: it's not clear, even to me, that hole abstractions/difference lists/linear representational functions are worth adding to a functional programming language if all they can do is be applied and composed. However, with the additional expressiveness that can be found in pattern matching against hole abstractions (hell, that's a way better name than "linear representational functions," I'm just going to call them hole abstractions from now on), I think there's a significantly stronger case to be made.

I've thrown a transliteration of Minamide's examples up on GitHub. The binary tree insertion example, in particular, could come from Huet's paper: it's a beautiful example of how even the "Part 1" language can implement zipper algorithms so long as you never need to move up in the tree. As for the hfun_addone function, I don't really understand it at all, I just transliterated it. In particular, it seems to not be entirely tail recursive (in particular, it seems to regular-recurse along the leftmost spine of the binary tree - if I'm not mistaken about this, I accuse line 131 of being the culprit.)

6.3 Logic Variable

(Note: I've tried to make sure this section doesn't come across as mean-spirited in any way, but I want to emphasize: this is not intended to be any sort of criticism of Yasuhiko Minamide. His paper was terrific! Go read it!)

My other favorite part of reading the paper was Section 6.3, which discusses difference lists. I plan to actually email Minamide and ask him about Section 6.3. Here's my reconstruction of events: Fictional Minamide has a great idea, implements it into a prototype ML compiler, tests it. His paper has strong theoretical and practical results, and gets accepted to POPL 1998. However, one of the reviewers says "oh, this is almost exactly difference lists in logic programming, you need to explain the connection." Fictional Minamide is not a logic programming person, has no access to a logic programming person, doesn't particularly have any reason to care about logic programming, but he does feel the need to address the concerns of a reviewer that he can't actually communicate with. Fictional Minamide manages to find, with great difficulty in the pre-Google-Scholar world, some poorly expressed explanation for what difference lists are in some random papers that are mostly about something else.¹ After a couple of hours, Fictional Minamide gives up, exclaiming "okay, this makes absolutely no sense whatsoever," and writes something kinda mumbly about graph reduction that seems vaguely plausible to satisfy the reviewer's demand.

The result is a section that mostly misses the point about the connection between hole abstraction and difference lists, both of which are declarative abstractions that allow a programmer to think about working modulo an uninitialized pointer in memory (though the way Minamide and I do it, the types help you way more). This is not intended as any criticism of either Real Minamide or Fictional Minamide. Indeed, it's mostly a criticism of the conference review processes: I'm pretty sure you could find similar "Section 6.3"s in my papers as well!² I do hope, however, that my exposition in Part 1, which was in fact motivated by difference lists and reached more-or-less the exact same setup that Minamide came up with, clarifies the record on how these two ideas are connected.

[Update Aug 25] I heard back from Real Minamide - while it was a long time ago, he did recall one of the reviewers mentioning logic variables, leading to the inclusion of that section. I win! Well, kind of. There was probably no "this makes no sense" exclamation; Minimade says that at the time his understanding at the time was in line with my comment about working modulo an uninitialized pointer. The comments about graph reduction, which were why I thought the section misses the point, were more of a side comment.

Minamide also remembered another wonderfully tantalizing tidbit: he recalls that, at POPL 1998, Phil Wadler said he'd seen a similar idea even earlier. Perhaps hole abstraction is just destined to be continuously reinvented until it finally gets included in the C++2040 standard.³

¹ The work he cites is on the adding the logic programming notions of unbound variables to functional programming languages, which (while I haven't looked at them) certainly don't look they would give a good introduction to the simple-but-goofy logic programming intuitions behind difference lists.

That said, I basically have never seen a good, clear, self-contained description of what a difference list is - I consider Frank's logic programming notes to be pretty good, but I recognize that I'm not a fair judge because I was also present at the associated lecture.

² Grep for "∇", or "nominal", especially in anything prior to my thesis proposal, if you want a head start.

³ I wonder what brackets they'll use, since as of the current standard they seem to have run out.