Tuesday, April 3, 2012

Differently-higher-order-focusing

I was toying with an idea last weekend, but it passed the "this is taking too much of your time, go back to your thesis" threshold, so I'm going to jot down what I figured out here, link to some Agda code, and maybe return to it in the future.

I'm going to use intuitionistic multiplicative, additive linear logic (without the exponential $${!}A^-$$) in this note. Doing it this way is the easiest way to explain everything, though it does mean I'm assuming some familiarity with intuitionistic linear logic. I'm also not going to use shifts; the Agda formalizations do use shifts, but the difference isn't important.

In polarized linear logic without shifts, there's one big syntactic class $$A$$ with all the propositions from linear logic, but they are sorted into positive and negative propositions based on their topmost connectives. Atomic propositions can be given arbitrary polarity $$p^+$$ or $$p^-$$ as long as that polarity is consistent.

• $$A^+ ::= p^+ \mid {\bf 0} \mid A \oplus B \mid {\bf 1} \mid A \otimes B$$
• $$A^- ::= p^- \mid \top \mid A ~\&~ B \mid A \multimap B$$

Quick review of higher-order focusing

Higher-order focusing as originally defined by Noam takes as the definition of connectives as given by a pattern judgment. Pattern judgments for positive types track the construction of terms; a reasonable way of reading $$\Delta \Vdash A^+$$ is that it says "given all the stuff in $$\Delta$$, I can construct a proof of $$A^+$$."

$\infer {p^+ \Vdash p^+} {} \qquad \infer {A^- \Vdash A^-} {}$$\infer {\Delta \Vdash A \oplus B} {\Delta \Vdash A} \qquad \infer {\Delta \Vdash A \oplus B} {\Delta \Vdash B}$$\infer {\cdot \Vdash {\bf 1}} {} \qquad \infer {\Delta_A, \Delta_B \Vdash A \otimes B} {\Delta_A \Vdash A & \Delta_B \Vdash B}$

Similarly, the negative pattern judgment tracks how negative types are used. A reasonable reading of the pattern judgment $$\Delta \Vdash A^- > C$$ is that it says "given all the stuff in $$\Delta$$, I can use a proof of $$A^-$$ to prove $$C$$."

$\infer {\cdot \Vdash p^- > p^-} {} \qquad \infer {\cdot \Vdash A^+ > A^+} {}$$\infer {\Delta \Vdash A ~\&~ B > C} {\Delta \Vdash A > C} \qquad \infer {\Delta \Vdash A ~\&~ B > C} {\Delta \Vdash B > C} \qquad \infer {\Delta, \Delta' \Vdash A \multimap B > C } {\Delta \Vdash A & \Delta' \Vdash B > C}$

One of the nicest things about higher-order focusing, in my opinion, is that it removes the question of what order the invertible left rules get applied in altogether: when we come to a point where invertible rules must be applied, the invertible rules happen all at once by logical reflection over the pattern judgment.1

$\infer {\Delta, [A^+] \vdash C} {\forall(\Delta' \Vdash A^+) \longrightarrow \Delta, \Delta' \vdash C} \qquad \infer {\Delta \vdash [A^-]} {\forall(\Delta' \Vdash A^- > C) \longrightarrow \Delta, \Delta' \vdash C}$

In particular, this means that a derivation of $$\Delta, [p^+ \oplus (A^- \otimes q^+) \oplus {\bf 1}] \vdash C$$ has three immediate premises corresponding to the three one of which is a derivation of $$\Delta, p^+ \vdash C$$, one of which is a derivation of $$\Delta, A^-, q^+ \vdash C$$, and one of which is a derivation of $$\Delta \vdash C$$.

This "do all the inversion all at once" idea is a strengthening of the idea that appears in Andreoli's original presentation of focusing, in the definition of CLF, in Kaustuv Chaudhuri's focused linear logic, and in my work on structural focalization. All of these aforementioned presentations have a separate, ordered inversion context that forces you to decompose positive propositions into contexts and negative propositions in one arbitrary-but-fixed order, which kind of lets you pretend they all happen at once, since you can't make any choices at all until you're done. This isn't the only way to look at it, though. In Miller and Liang's LJF, in Frank Pfenning's course notes on Linear Logic, and in probably a bunch of other places, the inversion steps can happen in any order, though for a fully focused system you have to do get through all the possible inversion steps before you focus again. In order to get the strength of a Andreoli/Chaudhuri/me-focused systems in an LJF-like setting, one must prove a Church-Rosser-esque property that all ways of doing inversion are the same.

The reason I think the use of higher-order formulations is particularly nice is that it blows away the distinction between systems that fix the inversion order (like Structural Focalization and its many predecessores does) and systems that allow nondeterminism in the inversion order (like LJF and its kin). There's no question about the presence or absence of inessential nondeterminism when all the nondeterminism seriously happens in one rule that has as many premises as necessary to get the job done.

One of the less-awesome things about higher-order focusing is that it requires this extra substitution judgment, a complication that I won't go into here but that gets increasingly annoying and fiddly as we go to more and more interesting substructural logics, which are of particular interest to me. (This is not to say that it doesn't work, I wrote a note on higher-order focusing for ordered logic way back two years ago when Request For Logic was a series of Unicodey-notes instead of a blog.) To try to get the best of both worlds, Taus and Carsten's Focused Natural Deduction is the only system I know of that tries to use the pattern judgments just for negative introduction/positive elimination while using the familiar negative eliminations and positive introduction structures.

I want to play the same game Taus and Carsten play, but without using the pattern judgments at all. I think I have a better idea, but it ended up being more work than I have time for to actually prove that it's better (or to conclude that I'm wrong and it's not).

Alternate formulation

If $$P(\Delta)$$ is a judgment on linear contexts $$\Delta$$, then $$I^+_{A^+}(\Delta. P(\Delta))$$ - which we can mathematically $$\eta$$-contract and write $$I^+_{A^+}(P)$$ where convenient - is also a judgment, defined inductively on the structure of the positive proposition $$A^+$$ as follows:

• $$I^+_{p^+}(\Delta. ~ P(\Delta)) = P(p^+)$$
• $$I^+_{A^-}(\Delta. ~ P(\Delta)) = P(A^-)$$
• $$I^+_{\bf 0}(\Delta. ~ P(\Delta)) = \top$$
• $$I^+_{A \oplus B}(\Delta. ~ P(\Delta)) = I^+_{A}(P) \times I^+_{B}(P)$$
• $$I^+_{\bf 1}(\Delta. P(\Delta)) = P(\cdot)$$
• $$I^+_{A \otimes B}(\Delta. ~ P(\Delta)) = I^+_{A}(\Delta_A. ~ I^+_{B}(\Delta_B. ~ P(\Delta_A, \Delta_B))$$
This definition has a certain obvious correspondance to the pattern judgment from "Noam-style" higher order focusing, but it's very direct: given a property $$P$$ that involves a linear context, $$I_{A^+}(P)$$ proves that property on every context which can prove $$A^+$$.

Given this judgment, we can use it pretty straightforwardly to define the inversion judgment. $\infer {\Delta, [A^+] \vdash C} {I^+_{A^+}(\Delta_{A}. ~~\Delta, \Delta_{A} \vdash C)}$ The result is exactly same as the formulation that used the higher-order pattern judgment, but this formulation is, in my mind at least, rather more direct while just as appropriately higher-order.

The story for negative types is the same. If $$P(\Delta,C)$$ is a judgment on linear contexts $$\Delta$$ and conclusions (conclusions are positive propositions $$A^+$$ and negative atomic propositions $$p^-$$), then $$I^+_{A^-}(P)$$ is defined inductively on the structure of types as follows:

• $$I^-_{p^-}(\Delta,C. ~ P(\Delta,C)) = P(\cdot,p^-)$$
• $$I^-_{A^+}(\Delta,C. ~ P(\Delta,C)) = P(\cdot,A^+)$$
• $$I^-_{A ~\&~ B}(\Delta,C. ~ P(\Delta,C)) = I^-_{A}(P) \times I^-_{B}(P)$$
• $$I^-_{A \multimap B}(\Delta. ~ P(\Delta,C)) = I^+_{A}(\Delta_A.~I^-_{B}(\Delta,C. ~ P((\Delta_A,\Delta), C)$$
The right-inversion rule follows the same pattern as the left-inversion rule: $\infer {\Delta \vdash [ A^- ]} {I^-_{A^-}(\Delta',C. ~~ \Delta, \Delta' \vdash C)}$

Where's the Beef?

I think this crazy new formulation of focusing is somewhat more elegant than the pattern-judgment-using formulation Taus and Carsten gave, even though it is otherwise rule-for-rule the same. However, that's not an argument that will convince anyone besides myself; there needs to be some advantage to using this formulation for it to have value. In particular, this formulation should make it easier and/or more beautiful to prove cut, identity expansion, and the completeness of focusing - it should at least have similar grace when formalized in Agda to the Agda proof of structural focalization. It turns out that the identity expansion theorem can be done with this new formulation in Agda, and it's beautiful. (When I showed it to Dan Licata he started muttering something something Yoneda Lemma, so it's at least likely the structure of identity expansion has some interesting and common categorial structure.) It's certainly obvious that there's some interesting deep structure at work.

Furthermore, it seems like, given this formulation, it should be possible to get the cut admissibility theorem working as an obviously turned-around version of identity expansion, which is the kind of interesting structure I've had my eye out for ever since I observed it in the process interpretation of linear logic. I haven't made that work yet, though, so I'm going to stick with inversion contexts for now and maybe return to poking at this post-thesis, if nobody else has figured it out in the meantime - or, indeed, if nobody else has already figured it out in the past.

Here are some Agda fragments for dealing with this idea rigid logic (a non-associative, non-unit-law-having version of ordered logic that is useless except as a way of thinking about substructural logics in Agda, as far as I can tell), and some other Agda fragments for persistent logic. This whole blog post should serve as a disclaimer that these are all quite half-baked.

1An aside on "higher-order"

This is the part of higher-order focusing that gives it the name "higher-order," in the sense that the derivation constructors take functions as arguments. I can't quite figure out at this moment how this usage fits into Chris's discussion of what should and should not be called higher order.

Noam called the paper introducing pattern judgments to the PL world "Focusing and higher-order abstract syntax" (POPL 2008). Noam now sees that naming as unfortunate, and I agree; it blurs the definition of "higher-order abstract syntax" (HOAS) in a different direction that it gets blurred elsewhere in the literature. In my idiolect, which of course I can't force on everyone else, I reserve HOAS for the use of substitution functions to encode term structure - in other words, I use it as a rough synonym of what I and others also call "abstract binding trees." (I'm not sure who coined "abstract binding tree," I associate it with Bob Harper.)

The whole point of higher-order focusing is that this function - the one from the pattern judgment $$\Delta' \Vdash A$$ to the provability judgment $$\Delta, \Delta' \vdash C$$ - is a function that case analyzes its arguments (to figure out which pattern judgment was given as an argument to the function). Putting our set-theorist hats on, it's a map, a potentially infinite list of (pattern derivation, provability derivation) tuples such that every pattern derivation is associated with exactly one provability derivation. Then we take off our set-theorist hats 'cause we dislike those hats; it's a real (pure, terminating, effect-free) function, like in the sense of functional programming. Or just programming. I personally try to use the phrase logical reflection to emphasize this pattern-matching higher-order-ness; Noam now prefers abstract higher-order syntax, which is fine too.

1. I found it nice to read about your musing, even as an outsider to the field -- though I must say the idea of "I spent too much time on this, the best thing to do now is to write a blog post about it" looks quite ironic.

A few typos:

- in the rules for Δ ⊩ A⁺, there is a shift lurking, and you wrote the same A⊕B rule twice.

- there is a closing parenthesis missing in the first occurence of I⁺{A⁺}(Δ.P(Δ)), first line of the "Alternate formulation" section

- η-contract misses the '-' which would make it more readable (besides I think using the η-contracted form in the left-hand-side of the definitions of I⁺ and I⁻ would indeed be more readable, by allowing to concentrate visual weight on the parts that matter)

- the rule for I⁻{A⊸B} should have I⁻{B} rather than I⁺{B}

Finally, I have a probably very naive question. As I'm not familiar with the litterature you're talking about, I'm not sure what those examples of "fixed order" vs. "confluent non-determinism" are (maybe an example would help), but I was wondering if you didn't understand a fixed order of some sort in you I⁺{A⊗B} and I⁻{A⊸B} definitions. For example you have:

I⁺{A⊗B}(P) := I⁺{A}(Δᴀ. I⁺{B}(Δʙ. P(Δᴀ,Δʙ))

You could have evaluated B's judgement context first, then A's.

I⁺{A⊗B}(P) := I⁺{B}(Δʙ. I⁺{A}(Δᴀ. P(Δʙ,Δᴀ)))

I think you could observe the difference on, for example, 0⊗(1⊕1). Is this the same kind of "fixed order" that you were talking about, or maybe a lesser issue?

(For the record and as a general comment, I don't dislike the idea of having non-determinism and confluence to prove, because I suspect that in general such confluence result tells you nice things about the system, that you maybe wouldn't have listened about if you made the problem go away with a fixed order. At the same time I tend to find confluence questions boring – in the majority of case where they work out easily — so I understand the designer's choice to lessen the proof burden, especially in a mechanized setting.)

2. I appreciate the irony. But working out structural focalization took weeks and weeks (and the writing was longer than the proving); getting something out of my system so I can return to more pressing writing is a worthwhile use of two hours, I think. Also, trying to sketch a bit of what I'm thinking is the best way to make sure that I'll have some clue what *I* was thinking when I look back at this in the future (as opposed to writing a few sketchy comments in the version-controlled text file that passes as my research notebook), and that's the real reason this blog exists, anyway.

Fixed the bugs you mention, I think. I explicitly intended to make this presentation shiftless, so I fixed the lurking shift by removing the one place a shift appeared in the pattern rule for Δ ⊩ A- > A-. (Focusing was developed without shifts, and the use of shifts remains a minority view in the focusing proof theory literature, it's just a minority view I favor.)

You're absolutely right that I am ordering the inversions in a certain sense - the proof obligation for proving (Δ, [0⊗(1⊕1)] ⊢ C) is ⊤, and the proof obligation for proving (Δ, [(1⊕1)⊗0] ⊢ C) is ⊤ × ⊤. So the confluence proof in this setting would be to show that all these potential proof obligations are type isomorphisms. I think confluence is eventually a property we ought to grapple with, but it's nice to be able to prove cut, identity, and focalization without dealing with that question - especially, as you say, in a mechanized setting. I'll be interested to see if it continues to always be the case that this is doable.

3. I'm coming back to this much later because I'm now interested in term
search for other reasons
( http://gallium.inria.fr/blog/singleton-types-for-code-inference/ ). I think
it is possible to restore the order-independence of your formulation
by seeing I⁺{A}(P) and P(Δ) not as a judgment, but as a set of
judgment: I⁺{A} would have type
(Context → Set(Judgment)) → Set(Judgment); then you can define I{0}(P)
as the empty set, I{A*B} is still I{A}(ΔA.I{B}(ΔB.P(ΔA,ΔB))), and the
assumptions for (Δ,[A⁺] ⊢ C) are the I{A}(ΔA. {Δ,ΔA ⊢ C}). This
presentation corresponds to unfolding the patterns to full depth
(irrespectively of when 0 is encountered), keeping less structure on
the proof obligations to not distinguish anymore these different
trivial obligations.

I think the full type for I,
(Context → Set(Judgment)) → (Positive → Set(Judgment)), corresponds to
doing some form of continuation-passing style. The structure could be
simplified with an operator I{A} returning directly a set of contexts,
that are exactly the set of realizing contexts Δ such that the pattern
judgment Δ ⊩ A holds. Noting contexts (A1,..,An) and sets {Δ₁,...Δn}:

I{p+} := {(p+)}
I{A-} := {(A-)}
I{0} := {} (empty set)
I{1} = {()} (empty context singleton)
I{A+B} := I{A} U I(B) (union)
I{A*B} := { Δ₁,Δ₂ | Δ₁ ∈ I{A}, Δ₂ ∈ I{B} } (cartesian product + context concatenation)

Remark the union in the I{A+B} rule; there is the question of whether
it is a disjoint sum or the usual set union, which makes a difference
in case some of the contexts on both sides happen to be the same
(eg. A+A or even (1*A)+(A+0)). The set union automatically recognizes
commons contexts and merge them, so they result, after the focusing
step, in one proof obligation instead of several identical
ones. I think that your formulation, when read naively, would tend to
use a disjoint sum, while Noam's quantification apparently suggests
a set union.

4. gasche: yes, I agree with all of this, including the fact that the straightforward proof obligation of my union means that you have to include two proofs instead of one for A+A. Of course, this is not a substructural logic, so you can always do that - the metatheory makes the difference between one and many copies of a proof irrelevant, I think. (Certainly this is true in the finitary case.)

The set-like notation is the older one; in my thesis and some other places I use a function-ey notation for the left rule for equality; in a lot of Miller's work he does the exact same thing with a set-based notation. We ought to be able to move pretty seamlessly between these two views if I'm not mistaken.