Showing posts with label Twelf. Show all posts
Showing posts with label Twelf. Show all posts

Thursday, January 12, 2012

Structural focalization updated

I've uploaded to both ArXiV and my webpage a significantly revised draft of the paper Structural focalization, which I've spoken about here before. Feedback is welcome!

One of the points I make about the structural focalization technique is that, because it is all so nicely structurally inductive, it can be formalized in Twelf. As part of a separate project, I've now also repeated the whole structural focalization development in Agda! The code is available from GitHub. While a structural focalization proof has some more moving parts than a simple cut-and-identity proof, it also has one significant advantage over every Agda proof of cut admissibility that I'm aware of: it requires no extra structural metrics beyond normal structural induction! (My favorite structural metric is the totally nameless representation, but there are other ways of threading that needle, including, presumably, these "sized types" that everyone seems to talk about.)

In regular, natural-deduction substitution, you can get away without structural metrics by proving the statement that if \(\Gamma \vdash A\) and \(\Gamma, A, \Gamma' \vdash C\) then \(\Gamma, \Gamma' \vdash C\); the extra "slack" given by \(\Gamma'\) means that you operate by structural induction on the second given derivation without ever needing to apply weakening or exchange. Most cut-elimination proofs are structured in such a way that you have to apply left commutative and right commutative cuts on both of the given derivations, making this process tricky at the best; I've never gotten it to work at all, but you might be able to do something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A, \Gamma'' \longrightarrow C\) then \(\Gamma, \Gamma', \Gamma'' \longrightarrow C\)." If someone can make this work let me know!

A focused sequent calculus, on the other hand, has three separate phases of substitution. The first phase is principal substitution, where the type gets smaller and you can do whatever you want to the derivations, including weakening them. The second phase is rightist substitution, which acts much like natural-deduction substitution, and where you can similarly get away with adding "slack" to the second derivation. The third phase is leftist substitution, and you can get by in this phase by adding "slack" to the first derivation: the leftist cases read something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A \longrightarrow C\) then \(\Gamma, \Gamma' \longrightarrow C\)."

In Structural focalization, I note that the structural focalization technique could be seen as a really freaking complicated way of proving the cut and identity for an unfocused sequent calculus. But in Agda, there's a reason you might actually want to do things the "long way" - not only do you have something better when you finish (a focalization result), but you get cut and identity without needing an annoying structural metric.

Tuesday, September 27, 2011

My New Focalization Technique is Unstoppable

While it took, as they say, a bit of doing, I have a completed draft of a paper on my website that I believe provides a really elegant solution to what has been a very, very annoying problem for some time: writing down a proof called the completeness of focusing. Don't worry, the paper explains what that means: it has to, because one of the things the paper argues is that most existing statements of the completeness of focusing aren't general enough! In the process, this writeup gave me a good (enough) excuse to talk in a very general way about a lot of fundamental phenomena around the idea of polarization in logic that we've been noticing in recent years.

Anyway, the draft is here - Structural Focalization. The accompanying Twelf development is, of course, on the Twelf wiki: Focusing. I've thrown the paper and the Twelf code up on arXiv while I figure out while I figure out what to do with it; comments would be greatly appreciated!

I don't want to have a post that's just "hey here's a paper," so here's an addition, a follow-up to the rather popular "Totally Nameless Representation" post from awhile back. I still prove weakening in Agda by exactly the technique presented there, which commenter thecod called the "presheaf method." But for natural deduction systems, I almost never use the term metric approach that was the core of what I was presenting in "totally nameless representation," since it works just fine to use the approach where term M of type A (the one you're substituting for x) is in context Γ and the term N (the one with x free) is in context Γ' ++ A :: Γ - the free variable x is allowed to be in the middle of the context, in other words; you don't have to use exchange on the second term, so you're always calling induction on something Agda recognizes as structurally smaller.

This worked for natural deduction systems, but I didn't think it would work for sequent calculi, since you needed to toy around with both contexts and induct on both terms in presentation of cut for a sequent calculus. However, for a focused sequent calculus like what I present in the paper, you still can do without the totally nameless metric! If you set things up right, the rightist substitutions (where you work on decomposing the right term, the one with the variable free) allow you to extend the context only of the first term, and the leftist substitutions (where you work on decomposing the left term, the one you're substituting for the variable) allow you to work on the second term, and the two are only connected by the principal substitutions (which reduce the type, which more or less lets you get away with anything as far as the induction principle is concerned).

A code example of this, which could use some more commentary, can be found on GitHub: Polar.agda.

Sunday, November 14, 2010

Totally nameless representation

[Note: I am unaware of any truly original content in this post; when I say "we picked this up from Dan" it doesn't imply Dan worked it out himself; when I say "we worked this out" I don't have any reason to believe we were the first. Dan Licata in a comment lists some of the references to things we generally absorbed from him.]

This post is about two things. The first topic is a style of formalization in Agda that Bernardo Toninho and I picked up on from Dan Licata. The second topic is a way of establishing termination arguments that Bernardo and I worked out and used extensively in the Agda formalizaiton of several variants of Constructive Provability Logic, which I've mentioned before. Because I'm a bit perverse, I'll be using my own Agda Standard Library as the basis for this post; the code for this post can be found here - since it's a demo bound together with my library, technically I guess I'm supposed to keep the file at that link (though maybe not this post) in sync with the library as the library evolves.

Starting point: the simply-typed lambda calculus

This is a pretty standard Agda formalization of "Church-style" simply-typed lambda calculus terms. "Church-style," as opposed to "Curry-style," encodings only allow for the existence of well-typed terms.*
   open import Prelude

infixr 5 _⊃_
data Type : Set where
con : StringType
_⊃_ : TypeTypeType

Ctx = List Type

infixl 5 _·_
data Term (Γ : Ctx) : TypeSet where
var : ∀{A} → A Γ → Term Γ A
_·_ : ∀{A B} → Term Γ (A ⊃ B) → Term Γ A → Term Γ B
Λ : ∀{A B} → Term (A :: Γ) B → Term Γ (A ⊃ B)
The only non-obvious part of this encoding is that variables are represented by "dependent de Bruijn indices" - instead of a de Bruijn index being just a natural number, it's the pointer into its type in the context. The nice part about doing it this way is that you can define a very powerful notion of "generalized weakening" that is based on the "subset" partial order on contexts (Γ Δ is a function mapping all the indices A Γ to indices A Δ).
   _⊆_ : CtxCtxSet
_⊆_ = LIST.SET.Sub

wk : ∀{Γ Δ A} → Γ Δ → Term Γ A → Term Δ A
wk θ (var n) = var (θ n)
wk θ (n1 · n2) = wk θ n1 · wk θ n2
wk θ (Λ n) = Λ (wk (LIST.SET.sub-cons-congr θ) n)
The nice part about this definition, and one of the things I picked up on from Dan, is that we can then define the usual weakening, exchange, and contraction properties directly by using generalized weakening and the corresponding facts about the subset relation on lists (which are defined in my standard library).
   wken : ∀{Γ A C} → Term Γ C → Term (A :: Γ) C
wken = wk LIST.SET.sub-wken

exch : ∀{Γ A B C} → Term (A :: B :: Γ) C → Term (B :: A :: Γ) C
exch = wk LIST.SET.sub-exch

cntr : ∀{Γ A C} → Term (A :: A :: Γ) C → Term (A :: Γ) C
cntr = wk LIST.SET.sub-cntr
With that out of the way, we can talk about substitution, both the "obvious way" and the metric-indexed way Bernardo and I have used extensively.

Straightforward substitution

The obvious (for some value of obvious) and direct definition of substitution involves 1) generalizing the induction hypothesis to allow for even more additions (tm-subst) and 2) writing a variable substitution lemma that handles the case where the substituted-into term is a variable and we either perform the substitution, lower the de Bruijn index, or leaving the de Bruijn index alone (var-subst). The main theorem is inductive on the structure of the first term.
   subst : ∀{Γ A C} → Term Γ A → Term (A :: Γ) C → Term Γ C
subst = tm-subst [] _
where
var-subst : ∀{A C} (Γ Γ' : Ctx)
Term Γ' A
→ C ++ [ A ] ++ Γ')
Term++ Γ') C
var-subst [] Γ' n1 Z = n1
var-subst [] Γ' n1 (S n) = var n
var-subst (_ :: Γ) Γ' n1 Z = var Z
var-subst (_ :: Γ) Γ' n1 (S n) = wken (var-subst Γ Γ' n1 n)

tm-subst : ∀{A C} (Γ Γ' : Ctx)
Term Γ' A
Term++ [ A ] ++ Γ') C
Term++ Γ') C
tm-subst Γ Γ' n1 (var n) = var-subst Γ Γ' n1 n
tm-subst Γ Γ' n1 (n · n') = tm-subst Γ Γ' n1 n · tm-subst Γ Γ' n1 n'
tm-subst Γ Γ' n1 (Λ n) = Λ (tm-subst (_ :: Γ) Γ' n1 n)
To motivate the next step, I'll point out that the only reason we had to generalize the induction hypothesis is because of the lambda case - if we try to prove the theorem directly, we find ourselves in the position of having a Term Γ A and a Term (B :: A :: Γ) C; we can't apply the induction hypothesis directly on these two terms. We can apply wken to the first subterm and apply exch to the second subterm and call subst recursively, but now we can't establish termination on the structure of the second term, since we've passed it to the exch function, which changed its structure by replacing Zs with (S Z)s and vice versa.

Setting up a metric

The alternate solution I intend to present involves setting up a metric; this metric expresses the "shape" of a term, disregarding all binding. This is a three-step process: first we define a generic "tree" piece of data that can capture the shape of a term (disregarding binding), and second we define a copy of the Term datatype, called MTerm, that has a Metric as one of its indices.
   data Metric : Set where
: Metric
_●_ : MetricMetricMetric
> : MetricMetric

data MTerm (Γ : Ctx) : TypeMetricSet where
var : ∀{A} → A Γ → MTerm Γ A
_·_ : ∀{A B m m'} → MTerm Γ (A B) m → MTerm Γ A m' → MTerm Γ B (m m')
Λ : ∀{A B m} → MTerm (A :: Γ) B m → MTerm Γ (A B) (> m)
The third step is to show that we can freely get into and out of the metric; potentially we should also show that these two actions are an isomorphism, but I haven't run across the need to actually prove that. It really turns out that Agda does most of the work here...
   tm→ : ∀{Γ A m} → MTerm Γ A m → Term Γ A
tm→ (var n) = var n
tm→ (n1 · n2) = tm→ n1 · tm→ n2
tm→ (Λ n) = Λ (tm→ n)

→tm : ∀{Γ A} → Term Γ A → ∃ λ m → MTerm Γ A m
→tm (var n) = , var n
→tm (n1 · n2) = , snd (→tm n1) · snd (→tm n2)
→tm (Λ n) = , Λ (snd (→tm n))
Now that we have our metric, we can prove a stronger and more useful version of the generalized weakening lemma: not only can we apply generalized weakening to terms, but the resulting term has the exact same shape (as shown by the fact that the same metric "m" appears as an index both to the function's argument and its conclusion.
   wkM : ∀{Γ Δ A m} → Γ  Δ → MTerm Γ A m → MTerm Δ A m
wkM θ (var n) = var (θ n)
wkM θ (n · n') = wkM θ n · wkM θ n'
wkM θ (Λ n) = Λ (wkM (LIST.SET.sub-cons-congr θ) n)
We don't need to prove weakening twice; our previous weakening lemma is just a simple consequence of this new, stronger one:
   wk : ∀{Γ Δ A} → Γ  Δ → Term Γ A → Term Δ A 
wk θ n = tm→ (wkM θ (snd (→tm n)))


Substitution with the metric

With the metric in tow, substitution is straightforward. Termination is established by induction on the metric of the second term. Note the use of both wken (weakening outside of the metric) and exchM (exchange inside the metric) in the lambda case.
   subst : ∀{Γ A C} → Term Γ A → Term (A :: Γ) C → Term Γ C
subst n1 n2 = substM n1 (snd (→tm n2))
where
substM : ∀{Γ A C m} → Term Γ A → MTerm (A :: Γ) C m → Term Γ C
substM n1 (var Z) = n1
substM n1 (var (S n)) = var n
substM n1 (n · n') = substM n1 n · substM n1 n'
substM n1 (Λ n) = Λ (substM (wken n1) (exchM n))
In this example, it's not clear that we've saved much work: setting up the metric was a lot of boilerplate, though it is straightforward; the benefit that we gained in terms of proof simplicity was primarily an artifact of Agda's pattern matching: either a variable was the first variable in the context (and therefore represented by the de Bruijn index Z), or it was later in the context. So this example isn't the world's best argument for this style, but in the more complex logics that Bernardo and I dealt with, we found that this technique seemed, of all the approaches we could think of, to be the one least likely to fail on us.

Comments

Dan's said on a couple of occasions that he thinks there are less painful ways of doing this sort of thing, existing Agda techniques that avoid the annoying duplication of data types. My understanding is also that there is a not-totally-integrated notion of "sized types" in Agda that could maybe handle this sort of thing, but that the way it works is by defining metrics that are only natural numbers. It seems like a step backwards, somehow, if we go from structural induction to natural number induction. Furthermore, sized types based on natural numbers would almost certainly fail to handle the vaguely insane termination arguments that come up in constructive provability logic.

Totally nameless representations?

The germ of this post comes from a conversation with Chris Casinghino at UPenn back in May. He was working on extending this sort of simple representation in a way that was similarly-insane but mostly unrelated to constructive provability logic. Our problems were different; he placed the blame for some of the difficulties he encountered on the "sillyness of using all de bruijn indices instead of a locally nameless encoding," which was a response that I initially found puzzling. I've since decided that my puzzlement was the result of the fact that he was thinking of the specific thing he was doing in terms of programming languages, and I was thinking about the thing I was doing (constructive provability logic) in terms of logic - my guess is that we're both so used to working on both sides of the Curry-Howard correspondence that it wouldn't have occurred to us that this might make a difference!

From a programming languages perspective, we have programs and typing derivations that show programs to be well typed; the locally nameless encoding gives you the ability to apply the exchange or weakening lemma and get a new typing derivation that says that the same term is still well typed. That wasn't the case for our original "Church-style" lambda calculus terms. In my current view, that's because a "Church-style" lambda calculus term is more like the typing derivation than it is like a term. It's a typing derivation without the term it's typing; in other words, it's a proof in natural deduction form! And what Bernardo and I was using was something that, for our purposes, was just as good as a derivation that the same term was well typed in the exchanged context - we got that the same metric was associated with the exchanged proof term.

In other words, it seems that, for the purposes of termination arguments, the metric serves the same purpose a locally nameless proof term does. Certainly it avoids the "silliness of using all de Bruijn indices," as it uses none whatsoever, hence the mostly-joking title of this post. However, it's quite possible I'm off base; I'd be interested in what people more versed in locally nameless representation thought. I should add that I certainly think it is possible to do an intrinsically typed locally nameless representation; you'd just have to have (at minimum) the free variables be aware of their type. If you were interested primarily in representing a programming language that might even be the right thing to do; but the totally nameless metrics seem to be an easier fit for developments where the primary interest is in the logic side of the correspondence.

* Shameless plug for my advisor Frank Pfenning's excellent essay, "Church and Curry, Combining Intrinsic and Extrinsic Typing" which talks about properties of Church-style versus Curry-style encodings and relates them to subtyping and work by William Lovas.

P.S. This blog post has been about Agda; the first time I used this kind of tree structural metric on terms was in Twelf, when (on a challenge from John Boyland) I showed a bijection between HOAS and de Bruijn form representations of the terms of an untyped (or, if you prefer, "Curry-style") lambda calculus terms.

Monday, September 20, 2010

Natural deduction and sequent calculus - united in a polarized linear framework

In the last post I talked a little bit about what it means to give atomic propositions in a logical framework polarity. The main point I wanted to get around to was that if we have a proposition in the signature of the form P1+ ⊃ P2+ ⊃ P3+, then all the information contained in that rule is captured by this synthetic inference rule:
  Γ, P1+, P2+, P3+ ⊢ Q
-------------------
Γ, P1+, P2+ ⊢ Q
On the other hand, if I have a proposition in the signature of the form P1- ⊃ P2- ⊃ P3-, then all the information contained in that rule is captured by this synthetic inference rule:
  Γ ⊢ P1-
Γ ⊢ P2-
--------
Γ ⊢ P3-
By further mixing-and-matching the polarities of atomic propositions, we can get a whole host of synthetic inference rules.

First, a clarification

Before I get started with this post, I want to point out that in the last post I said something imprecise: that the strongest possible form of adequacy is to say that the synthetic inference rules induced by your signature have an exact correspondence with the "on-paper" inference rules. I then gave an example, saying that
  ∨L : (hyp A → conc C)
→ (hyp B → conc C)
→ (hyp (A ∨ B) → conc C)
corresponds to this expected synthetic inference rule
  Γ, hyp (A ∨ B), hyp A ⊢ conc C
Γ, hyp (A ∨ B), hyp B ⊢ conc C
-------------------------------- ∨L
Γ, hyp (A ∨ B) ⊢ conc C
only if (hyp A) has positive polarity. But this isn't quite the right story, because from the definition of ∨L we know nothing about the contents of Γ, and our notion of adequacy with respect to the sequent calculus requires that our judgments take the form Γ ⊢ conc C where Γ contains only atomic propositions hyp A for some propositions A. So, really, this strongest possible notion of adequacy needs one additional puzzle piece, a generalization of what the Twelfers would call regular worlds. We can assume the conclusion of the derived rule is in the image of some "on paper" judgment, but we also need to verify that the premises will always end up similarly in the image of some "on paper" judgment.

Now, with that out of the way...

Representing natural deduction in LLF

The post today starts off working with a subset of the Linear Logical Framework (LLF) designed (in large part) by Cervesato. We can reason about specifications in this framework using Jason Reed's HLF/Twelf implementation. Essentially, we will be using this framework the same way we used it last week: defining a signature and looking at the synthetic inference rules that arise from that signature.

It has recently become common to see presentations of the canonical forms of a simply-typed lambda calculus presented in spine form; doing so removes a major source of unplesentless in the proof of global soundness/hereditary substitution. You do have to pay the piper, though: it is more difficult to handle the proof of global completeness/η-expansion in a spine form calculus. However, a Twelf proof that Frank Pfenning worked out in February '09, which uses something like a third-order premise, can successfully handle the proof of global completeness for a spine-form presentation of canonical forms. One version of this Twelf proof, which can be found here, is presented as a bunch of Twelf code without much commentary, and the current story arose in part due to my attempts to provide some commentary and intuition for this Twelf proof of eta expansion and generalize it to substructural logics.

In the process, I ended up investigating another way of presenting logic that is somewhere between a natural deduction and spine form presentation. A bit of Twelf code describing this presentation can be found here, but the key point is that, when we need to verify an atomic proposition Q, we pick some hypothesis A from the context and show that, from a use of A we can prove that Q can be used. This is basically what happens in spine form, but here the "spiney" things associate the opposite of the way they do in spine form presentations. It looks something like this:
  end : use A A.
atm : hyp A -> use A (a Q) -> verif (a Q).
⊃I : (hyp A -> verif B) -> verif (A ⊃ B).
⊃E : use A (B1 ⊃ B2) -> verif B1 -> use A B2.
One thing that we've known for a long time is that the atomic proposition (use A B) is an instance of a pattern - it can be thought of as a Twelf representation of the linear funtion use A -o use B, in which case the rule end above is just the identity function λx.x.

But, since we're putting ourself conceptually in the territory of LLF, not the territory of LF/Twelf, we can represent the thing that is morally a linear function as an actual linear function. The main trick is appropriately using unrestricted implication -> versus linear implication -o in such a way that it controls the branch of the subproof that the linear assumption use B winds up in. The resulting signature looks much like a standard natural deduction presentation with a very non-standard transition between uses and verifications (the usual rule is just use (a Q) -> verif (a Q)) and a sprinkling of lollipops where we'd normally see arrows.
  atm : hyp A -> (use A -o use (a Q)) -> verif (a Q).
⊃I : (hyp A1 -> verif A2) -> verif (A1 ⊃ A2).
⊃E : use (A1 ⊃ A2) -o verif A1 -> use A2.
∧I : verif A1 -> verif A2 -> verif (A1 ∧ A2).
∧E1 : use (A1 ∧ A2) -o use A1.
∧E2 : use (A1 ∧ A2) -o use A2.
We are interested, for the purpose of this presentation, in hypothetical judgments of the form (Γ; · ⊢ hyp A), (Γ; · ⊢ verif A), and (Γ; use A ⊢ use B), where Γ contains unrestricted facts of the form hyp A. In light of this, we can show the synthetic inference rules that result from the above signature.
  ---------------------------- hyp
Γ, hyp A; · ⊢ hyp A

---------------------------- use
Γ; use A ⊢ use A

Γ; · ⊢ hyp A
Γ; use A ⊢ use (a Q)
---------------------------- atm
Γ; · ⊢ verif (a Q)

Γ, hyp A1; · ⊢ verif A2
---------------------------- ⊃I
Γ; · ⊢ verif (A1 ⊃ A2)

Γ; use B ⊢ use (A1 ⊃ A2)
Γ; · ⊢ verif A1
---------------------------- ⊃E
Γ; use B ⊢ use A2

Γ; · ⊢ verif A1
Γ; · ⊢ verif A2
---------------------------- ∧I
Γ; · ⊢ verif (A1 ∧ A2)

Γ; use B ⊢ use (A1 ∧ A2)
---------------------------- ∧E1
Γ; use B ⊢ use A1

Γ; use B ⊢ use (A1 ∧ A2)
---------------------------- ∧E2
Γ; use B ⊢ use A2
There are basically no surprises - the complete HLF/Twelf specification, including global soundness and completeness proofs, is essentially just a repeat of the most closely related related Twelf specification where the elimination rules in the signature look somewhat less unusual.

Switching up the polarity

Now, let us imagine that we add atoms with positive polarity to LLF in a way analagous to their addition in the previous post. Obviously (hyp A) is naturally positive, so we can make that change, but that is not the interesting point. Consider the derived rules if we *also* make (use A) positive and restrict our attention to sequents of the form (Γ; · ⊢ verif A) and (Γ; use A ⊢ use (a Q)).
  ---------------------------- use
Γ; use (a Q) ⊢ use (a Q)

Γ, hyp A; use A ⊢ use (a Q)
---------------------------- atm
Γ, hyp A; · ⊢ verif (a Q)

Γ, hyp A1; · ⊢ verif A2
---------------------------- ⊃I
Γ; · ⊢ verif (A1 ⊃ A2)

Γ; use A₂ ⊢ use (a Q)
Γ; · ⊢ verif A1
---------------------------- ⊃E
Γ; use (A₁ ⊃ A₂) ⊢ use (a Q)

Γ; · ⊢ verif A1
Γ; · ⊢ verif A2
---------------------------- ∧I
Γ; · ⊢ (verif A1 ∧ verif A2)

Γ; use A1 ⊢ use (a Q)
---------------------------- ∧I1
Γ; use (A1 ∧ A1) ⊢ use (a Q)

Γ; use A2 ⊢ use (a Q)
---------------------------- ∧I2
Γ; use (A1 ∧ A1) ⊢ use (a Q)
While our previous formalizaiton was an interesting sort-of modified version of natural deduction, the result when we change the polarity of the atomic proposition (use A) is precisely a focused sequent calculus.

A few related points

There are a lot of other neat explorations to consider in the near vicinity of this topic. Among them are:
  • Generalizing to a polarized, weakly-focused logic. I did most of this once, but there were some issues with my original proof.
  • What's the precise natural deduction system that the system with weak focusing + inversion on negative propositions corresponds to? Does it correspond exactly to the style of natural deduction used by Howe in "Proof Search in Lax Logic"?
  • Generalizing to a fully-focused logic (how close can we get, for instance, to the natural deduction system of Brock-Nannestad and Schürmann ("Focused Natural Deduction," LPAR 2010)?
  • Can this generalize to a focused linear sequent calculus, or do I only get to use linearity once per specification? Could the machinery of HLF be used to work around that?
  • The global soundness and completeness results that I established in HLF have the least amount of slickness possible - I basically just repeat the proof that I gave for the Twelf formalization without linearity. How close can we get to the intuition that the eta expansion theorem says use A -> verif A for all A?