tag:blogger.com,1999:blog-8274195142171302182024-03-09T21:46:03.028-05:00Request for LogicRobhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.comBlogger37125tag:blogger.com,1999:blog-827419514217130218.post-46380101637282719062014-01-09T10:53:00.001-05:002014-01-09T10:53:23.160-05:00Life update, a TA blog<div dir="ltr" style="text-align: left;" trbidi="on">
It's been over a year and a half since I last posted! In that time I've defended my <a href="http://typesafety.net/thesis">dissertation</a> and begun a teaching-track position at Carnegie Mellon. I'm mostly focused on teaching the freshman/sophomore level Principles of Imperative Computation course. This semester I will have a bit of a change of pace teaching Foundations of Programming Languages, a course designed by <a href="http://existentialtype.wordpress.com/">Bob Harper</a> and based on his book <a href="http://www.cambridge.org/us/knowledge/discountpromotion/?site_locale=en_US&code=L2PFPL">Practical Foundations of Programming Languages</a>.<br />
<br />
Some time ago Ron Garcia asked me if I would be blogging about my teaching here. I'd certainly like to, but thus far I haven't made the time. However, I'm pleased that one of my excellent and recurring Principles of Imperative Computation TAs, Nivedita Chopra, plans to <a href="http://c0andmore.wordpress.com/">blog about her experiences this semester</a>. I'm excited to read along myself!</div>
Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-76006057554880053542012-04-19T00:36:00.000-04:002013-01-15T21:49:15.980-05:00Charity and trust<p>A while back, I <a href="http://requestforlogic.blogspot.com/2012/01/why-does-acm-act-against-interests-of.html">complained about the ACM's membership in the AAP</a>, a publishing trade organization that supported some odious legislation like SOPA and the Research Works Act. The ACM, to their credit, responded to the concerns that the community at large raised (there were <a href="http://scienceblogs.com/confessions/2012/03/around_the_web_research_works_1.php">many others</a>). ACM's Director of Group Publishing Scott Delman <a href="http://requestforlogic.blogspot.com/2012/01/response-from-acms-scott-delman.html">did so in this space</a>, and ACM President Alain Chesnais <a href="http://blog.acm.org/president/?p=67">did so on his blog</a>. (Side note: in that post, and especially in the comments to it, Chesnais seemed to come really close to saying something like <i>"The ACM does not and will not support the RWA, SOPA, or the AAP's lobbying, because the ACM doesn't take political stances in either way as a matter of policy"</i> but he also seemed to take <i>really great pains</i> to not say anything that clear and specific, which was baffling to me.)</p>
<p>One of the arguments I used in that previous post was about <a href="http://www.acm.org/publications/acm-author-izer-service">Author-izer</a>, the new ACM service that allows you to provide the official version of your papers for free from your personal webpage. The reason I brought that in to the argument is that Author-izer is basically a good idea insofar as the CS community trusts the ACM to act in their interests. I still support the argument that their membership in the AAP runs directly counter those interests and thereby necessarily weakens that trust. Certainly, though, life's complicated and everyone gets to decide as an individual if they want to be a member (and thereby a supporter) of the ACM despite their affiliation with the AAP, just as the ACM as an institution gets to decide if it (we!) want to be a member (and thereby a supporter) of the definitely-lobbying AAP despite having a no-lobbying policy.</p>
<p>Andrew Appel, at the Freedom to Tinker blog, has a <a href="https://freedom-to-tinker.com/blog/appel/copyright-in-scholarly-publishing-2012-edition/">series</a> of <a href="https://freedom-to-tinker.com/blog/appel/modest-proposals-for-academic-authors/">three</a> blog <a href="https://freedom-to-tinker.com/blog/appel/contract-hacking-and-community-organizing/">posts</a> presenting four ways that academics can think about their relationship to copyright. I had fun reading them, and they provide an interesting alternative answer to why trusting the ACM to act in our interests is important. The model that Appel suggested that he subscribes to sees signing away one's valuable research papers as a <i>significant charitable contribution</i>. He's comfortable giving that charitable contribution both to ACM and IEEE, presumably because he broadly supports them and their roles, and because the cost to the world as a whole of the donation, relative to just making the donation truly free, is maybe not all that large. (I could note that reviewing papers also fits well within this model of charitable contribution.)
<p>The idea that adopting this view is a reasonable compromise, as Appel suggests, seems pretty reasonable to me, while remaining sympathetic (and potentially responsive) to concerns "that ACM and IEEE have fallen behind the curve." I certainly try to donate to organizations I believe in and trust! I would therefore certainly pause before making a in-kind charitable contribution to a highly profitable Dutch multinational that spearheads legislation like the RWA and <a href="http://www.sciencedirect.com/science/journal/14754916">lends credibility to homeopathy</a>, but that reflects my values and doesn't need to reflect yours.</p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-21921565575146689492012-04-03T00:35:00.001-04:002012-04-17T08:16:41.426-04:00Differently-higher-order-focusing<p>I was toying with an idea last weekend, but it passed the "this is taking too much of your time, go back to your thesis" threshold, so I'm going to jot down what I figured out here, link to some Agda code, and maybe return to it in the future.</p>
<p>I'm going to use intuitionistic multiplicative, additive linear logic (without the exponential \({!}A^-\)) in this note. Doing it this way is the easiest way to explain everything, though it does mean I'm assuming some familiarity with intuitionistic linear logic. I'm also not going to use shifts; the Agda formalizations do use shifts, but the difference isn't important.</p>
<p>In polarized linear logic without shifts, there's one big syntactic class \(A\) with all the propositions from linear logic, but they are sorted into positive and negative propositions based on their topmost connectives. Atomic propositions can be given arbitrary polarity \(p^+\) or \(p^-\) as long as that polarity is consistent.
<ul>
<li>\(A^+ ::= p^+ \mid {\bf 0} \mid A \oplus B \mid {\bf 1} \mid A \otimes B\)</li>
<li>\(A^- ::= p^- \mid \top \mid A ~\&~ B \mid A \multimap B\)</li>
</ul></p>
<h2>Quick review of higher-order focusing</h2>
<p>Higher-order focusing as originally defined by Noam takes as the definition of connectives as given by a pattern judgment. Pattern judgments for positive types track the <i>construction</i> of terms; a reasonable way of reading \(\Delta \Vdash A^+\) is that it says "given all the stuff in \(\Delta\), I can construct a proof of \(A^+\)."</p>
\[
\infer
{p^+ \Vdash p^+}
{}
\qquad
\infer
{A^- \Vdash A^-}
{}
\]\[
\infer
{\Delta \Vdash A \oplus B}
{\Delta \Vdash A}
\qquad
\infer
{\Delta \Vdash A \oplus B}
{\Delta \Vdash B}
\]\[
\infer
{\cdot \Vdash {\bf 1}}
{}
\qquad
\infer
{\Delta_A, \Delta_B \Vdash A \otimes B}
{\Delta_A \Vdash A & \Delta_B \Vdash B}
\]
<p>Similarly, the negative pattern judgment tracks how negative types are used. A reasonable reading of the pattern judgment \(\Delta \Vdash A^- > C\) is that it says "given all the stuff in \(\Delta\), I can use a proof of \(A^-\) to prove \(C\)."</p>
\[
\infer
{\cdot \Vdash p^- > p^-}
{}
\qquad
\infer
{\cdot \Vdash A^+ > A^+}
{}
\]\[
\infer
{\Delta \Vdash A ~\&~ B > C}
{\Delta \Vdash A > C}
\qquad
\infer
{\Delta \Vdash A ~\&~ B > C}
{\Delta \Vdash B > C}
\qquad
\infer
{\Delta, \Delta' \Vdash A \multimap B > C }
{\Delta \Vdash A & \Delta' \Vdash B > C}
\]
<p>One of the nicest things about higher-order focusing, in my opinion, is that it removes the question of what order the invertible left rules get applied in altogether: when we come to a point where invertible rules must be applied, the invertible rules happen <i>all at once</i> by logical reflection over the pattern judgment.<sup><a href="#dhof1">1</a></sup></p>
\[
\infer
{\Delta, [A^+] \vdash C}
{\forall(\Delta' \Vdash A^+) \longrightarrow \Delta, \Delta' \vdash C}
\qquad
\infer
{\Delta \vdash [A^-]}
{\forall(\Delta' \Vdash A^- > C) \longrightarrow \Delta, \Delta' \vdash C}
\]
<p>In particular, this means that a derivation of \(\Delta, [p^+ \oplus (A^- \otimes q^+) \oplus {\bf 1}] \vdash C\) has <i>three</i> immediate premises corresponding to the three one of which is a derivation of \(\Delta, p^+ \vdash C\), one of which is a derivation of \(\Delta, A^-, q^+ \vdash C\), and one of which is a derivation of \(\Delta \vdash C\).</p>
<p>This "do all the inversion all at once" idea is a strengthening of the idea that appears in Andreoli's original presentation of focusing, in the definition of CLF, in Kaustuv Chaudhuri's focused linear logic, and in my work on structural focalization. All of these aforementioned presentations have a separate, ordered <i>inversion context</i> that forces you to decompose positive propositions into contexts and negative propositions in one arbitrary-but-fixed order, which kind of lets you pretend they all happen at once, since you can't make any choices at all until you're done. This isn't the only way to look at it, though. In Miller and Liang's LJF, in Frank Pfenning's course notes on Linear Logic, and in probably a bunch of other places, the inversion steps can happen <i>in any order</i>, though for a fully focused system you have to do get through all the possible inversion steps before you focus again. In order to get the strength of a Andreoli/Chaudhuri/me-focused systems in an LJF-like setting, one must prove a Church-Rosser-esque property that all ways of doing inversion are the same.</p>
<p>The reason I think the use of higher-order formulations is particularly nice is that it blows away the distinction between systems that fix the inversion order (like Structural Focalization and its many predecessores does) and systems that allow nondeterminism in the inversion order (like LJF and its kin). There's no question about the presence or absence of inessential nondeterminism when all the nondeterminism <i>seriously</i> happens in one rule that has as many premises as necessary to get the job done.</p>
<p>One of the less-awesome things about higher-order focusing is that it requires this extra substitution judgment, a complication that I won't go into here but that gets increasingly annoying and fiddly as we go to more and more interesting substructural logics, which are of particular interest to me. (This is not to say that it doesn't work, I <a href="http://www.cs.cmu.edu/~rjsimmon/papers/rfl7-ordered.pdf">wrote a note on higher-order focusing for ordered logic</a> way back two years ago when Request For Logic was a series of Unicodey-notes instead of a blog.) To try to get the best of both worlds, Taus and Carsten's <a href="http://www.springerlink.com/content/h7551281753g5266/">Focused Natural Deduction</a> is the only system I know of that tries to use the pattern judgments just for negative introduction/positive elimination while using the familiar negative eliminations and positive introduction structures.</p>
<p>I want to play the same game Taus and Carsten play, but without using the pattern judgments at all. I think I have a better idea, but it ended up being more work than I have time for to actually prove that it's better (or to conclude that I'm wrong and it's not).</p>
<h2>Alternate formulation</h2>
<p>If \(P(\Delta)\) is a judgment on linear contexts \(\Delta\), then \(I^+_{A^+}(\Delta. P(\Delta))\) - which we can mathematically \(\eta\)-contract and write \(I^+_{A^+}(P)\) where convenient - is also a judgment, defined inductively on the structure of the positive proposition \(A^+\) as follows:
<ul>
<li>\(I^+_{p^+}(\Delta. ~ P(\Delta)) = P(p^+) \) </li>
<li>\(I^+_{A^-}(\Delta. ~ P(\Delta)) = P(A^-) \) </li>
<li>\(I^+_{\bf 0}(\Delta. ~ P(\Delta)) = \top \) </li>
<li>\(I^+_{A \oplus B}(\Delta. ~ P(\Delta)) = I^+_{A}(P) \times I^+_{B}(P) \) </li>
<li>\(I^+_{\bf 1}(\Delta. P(\Delta)) = P(\cdot)\) </li>
<li>\(I^+_{A \otimes B}(\Delta. ~ P(\Delta)) = I^+_{A}(\Delta_A. ~ I^+_{B}(\Delta_B. ~ P(\Delta_A, \Delta_B)) \) </li>
</ul>
This definition has a certain obvious correspondance to the pattern judgment from "Noam-style" higher order focusing, but it's very direct: given a property \(P\) that involves a linear context, \(I_{A^+}(P)\) proves that property on every context which can prove \(A^+\).</p>
<p>Given this judgment, we can use it pretty straightforwardly to define the inversion judgment.
\[
\infer
{\Delta, [A^+] \vdash C}
{I^+_{A^+}(\Delta_{A}. ~~\Delta, \Delta_{A} \vdash C)}
\]
The result is exactly same as the formulation that used the higher-order pattern judgment, but this formulation is, in my mind at least, rather more direct while just as appropriately higher-order.</p>
<p>The story for negative types is the same. If \(P(\Delta,C)\) is a judgment on linear contexts \(\Delta\) and conclusions (conclusions are positive propositions \(A^+\) and negative atomic propositions \(p^-\)), then \(I^+_{A^-}(P)\) is defined inductively on the structure of types as follows:
<ul>
<li>\(I^-_{p^-}(\Delta,C. ~ P(\Delta,C)) = P(\cdot,p^-)\)</li>
<li>\(I^-_{A^+}(\Delta,C. ~ P(\Delta,C)) = P(\cdot,A^+)\)</li>
<li>\(I^-_{A ~\&~ B}(\Delta,C. ~ P(\Delta,C)) = I^-_{A}(P) \times I^-_{B}(P)\)</li>
<li>\(I^-_{A \multimap B}(\Delta. ~ P(\Delta,C)) = I^+_{A}(\Delta_A.~I^-_{B}(\Delta,C. ~ P((\Delta_A,\Delta), C)\)</li>
</ul>
The right-inversion rule follows the same pattern as the left-inversion rule:
\[
\infer
{\Delta \vdash [ A^- ]}
{I^-_{A^-}(\Delta',C. ~~ \Delta, \Delta' \vdash C)}\]</p>
<h2>Where's the Beef?</h2>
<p>I think this crazy new formulation of focusing is somewhat more elegant than the pattern-judgment-using formulation Taus and Carsten gave, even though it is otherwise rule-for-rule the same. However, that's not an argument that will convince anyone besides myself; there needs to be some advantage to using this formulation for it to have value. In particular, this formulation should make it easier and/or more beautiful to prove cut, identity expansion, and the completeness of focusing - it should at least have similar grace when formalized in Agda to the <a href="https://github.com/robsimmons/agda-lib/tree/focalization">Agda proof of structural focalization</a>. It turns out that the identity expansion theorem can be done with this new formulation in Agda, and it's beautiful. (When I showed it to Dan Licata he started muttering something something <a href="http://en.wikipedia.org/wiki/Yoneda_lemma">Yoneda Lemma</a>, so it's at least likely the structure of identity expansion has some interesting and common categorial structure.) It's certainly obvious that there's some interesting deep structure at work.</p>
<p>Furthermore, it seems like, given this formulation, it should be possible to get the cut admissibility theorem working as an obviously turned-around version of identity expansion, which is the kind of interesting structure I've had my eye out for ever since I observed it in the <a href="http://www.cs.cmu.edu/~rjsimmon/papers/deng12relating.pdf">process interpretation of linear logic</a>. I haven't made that work yet, though, so I'm going to stick with inversion contexts for now and maybe return to poking at this post-thesis, if nobody else has figured it out in the meantime - or, indeed, if nobody else has already figured it out in the past.</p>
<p>Here are some Agda fragments for dealing with this idea <a href="https://github.com/robsimmons/agda-lib/blob/lambdatown/AltRigid.agda">rigid logic</a> (a non-associative, non-unit-law-having version of ordered logic that is useless except as a way of thinking about substructural logics in Agda, as far as I can tell), and some other Agda fragments for <a href="https://github.com/robsimmons/agda-lib/blob/lambdatown/AltPolar.agda">persistent logic</a>. This whole blog post should serve as a disclaimer that these are all quite half-baked.</p>
<hr/>
<p><sup><a name="dhof1"></a>1</sup><b>An aside on "higher-order"</b></p>
<p>This is the part of higher-order focusing that gives it the name "higher-order," in the sense that the derivation constructors take functions as arguments. I can't quite figure out at this moment how this usage fits into <a href="http://chrisamaphone.livejournal.com/751991.html">Chris's discussion</a> of what should and should not be called higher order.</p>
<p>Noam called the paper introducing pattern judgments to the PL world "Focusing and higher-order abstract syntax" (POPL 2008). <a href="http://www.cs.cmu.edu/~noam/papers/popl08-notes">Noam now sees that naming as unfortunate</a>, and I agree; it blurs the definition of "higher-order abstract syntax" (HOAS) in a different direction that it gets blurred elsewhere in the literature. In my idiolect, which of course I can't force on everyone else, I reserve HOAS for the use of <i>substitution functions</i> to encode term structure - in other words, I use it as a rough synonym of what I and others also call "abstract binding trees." (I'm not sure who coined "abstract binding tree," I associate it with Bob Harper.)</p>
<p>The whole point of higher-order focusing is that this function - the one from the pattern judgment \(\Delta' \Vdash A\) to the provability judgment \(\Delta, \Delta' \vdash C\) - is a <i>function that case analyzes its arguments</i> (to figure out which pattern judgment was given as an argument to the function). Putting our set-theorist hats on, it's a map, a potentially infinite list of (pattern derivation, provability derivation) tuples such that every pattern derivation is associated with exactly one provability derivation. Then we take off our set-theorist hats 'cause we dislike those hats; it's a real (pure, terminating, effect-free) function, like in the sense of functional programming. Or just programming. I personally try to use the phrase <i>logical reflection</i> to emphasize this pattern-matching higher-order-ness; Noam now prefers <i>abstract higher-order syntax</i>, which is fine too.</p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com4tag:blogger.com,1999:blog-827419514217130218.post-64582981675042216692012-03-05T00:13:00.000-05:002012-03-31T14:12:43.998-04:00What does focusing tell us about language design?<p>This post is largely based on some conversations I've had about Polarized Logic/Focusing/Call-By-Push-Value recently, especially with <a href="http://www.cs.cmu.edu/~neelk/">Neel Krishnaswami</a> and <a href="http://www.cs.jhu.edu/~nwf/">Wes Filardo</a>, though it was originally prompted by <a href="http://axisofeval.blogspot.com/2011/12/call-by-push-value.html">Manuel Simoni</a> some time ago.</p>
<p>I think that one of the key observations of focusing/CBPV is that programs are dealing with <i>two different things</i> - data and computation - and that we tend to get the most tripped up when we confuse the two.
<ul>
<li>Data is classified by data types (a.k.a. positive types). Data is defined by how it is constructed, and the way you use data is by pattern-matching against it.</li>
<li>Computation is classified by computation types (a.k.a. negative types). Computations are defined their eliminations - that is, by how they respond to signals/messages/pokes/arguments.</li>
</ul>
<!-- I have no better personal example of the clarifying power of call-by-push-value in language design than my <a href="http://requestforlogic.blogspot.com/search/label/holey%20data">holey data</a> series. In that series, I did little more than following my nose using CPBV as a guide, and I effectively <a href="http://requestforlogic.blogspot.com/2011/08/holey-data-postscript-hole-abstraction.html">rediscovered Minamide's 1998 POPL work</a> in the process; however, many of the decisions that seemed ad-hoc for Minamide could be understood as uniform and principled in the CPBV setting. Furthermore, because data is used by pattern matching against it, I took the obvious next step and discovered a really nice generalization of Minamide's work (pattern-matching hole abstractions as zippers).</p> -->
<p>There are two things I want to talk about, and they're both recursive types: call-by-push-value has positive recursive types (which have the feel of inductive types and/or algebras and/or what we're used to as datatypes in functional languages) and negative recursive types (which have the feel of recursive, lazy records and/or "codata" whatever that is and/or coalgebras and/or what William Cook calls objects). Both positive and negative recursive types are treated by Paul Blain Levy in <a href="http://www.cs.bham.ac.uk/~pbl/papers/thesisqmwphd.pdf">his thesis</a> (section 5.3.2) and in the <a href="http://www.springer.com/computer/swe/book/978-1-4020-1730-8?cm_cmm=sgw-_-ps-_-book-_-1-4020-1730-8">Call-By-Push Value book</a> (section 4.2.2).</p>
<p>In particular, I want to claim that Call-By-Push-Value and focusing suggest two fundamental features that should be, and generally aren't (at least simultaneously) in modern programming languages:
<ol>
<li>Support for structured data with rich case analysis facilities (up to and beyond what are called views)</li>
<li>Support for recursive records and negative recursive types.</li>
</ol>
</p>
<h3>Minor addition to "core Levy"</h3>
<!--
<p><b>Left-to-right evaluation:</b> One of the things CPBV does is make the ordering of computation very explicit, which, in practice, means that programs must be written in a sort of <a href="http://en.wikipedia.org/wiki/Administrative_normal_form">administrative normal form</a>. I'm going to choose to make my life easier by imposing a left-to-right evaluation order, so that a program like this:
<pre style="font-weight:bold"> <span style="color:#cc4020"><span style="color:#cc4020">force <span style="color:#0000ff">f</span> <span style="color:#0000ff">x</span> <span style="color:#0000ff">y</span></span> to <span style="color:#0000ff">z</span> in
<span style="color:#cc4020">force <span style="color:#0000ff">g</span> <span style="color:#0000ff">z</span> <span style="color:#0000ff">y</span></span> to <span style="color:#0000ff">w</span> in
<span style="color:#cc4020">force <span style="color:#0000ff">equal</span> <span style="color:#0000ff">z</span> <span style="color:#0000ff">w</span></span></span></pre>
can be written like this instead:
<pre style="font-weight:bold"> <span style="color:#cc4020">force</span> <span style="color:#0000ff">equal</span> <span style="color:#0000ff"><<span style="color:#cc4020">force</span> <span style="color:#0000ff">f</span> <span style="color:#0000ff">x</span> <span style="color:#0000ff">y</span>></span> <span style="color:#0000ff"><<span style="color:#cc4020">force</span> g z y></span></pre>
The blue angle braces are there to make it clear that there's an elimination form at work: if <tt style="font-weight:bold"><span style="color:#cc4020">CMP</span></tt> is computation code that has computation type <tt style="font-weight:bold"><span style="color:#cc4020">F <span style="color:#0000ff">typ</span></span></tt>, then <tt style="font-weight:bold"><span style="color:#0000ff"><<span style="color:#cc4020">CMP</span>></span></tt> is value code that has value type <tt style="font-weight:bold"><span style="color:#0000ff">typ</span></tt>. (In a real language, one would presumably be required to write neither those angle braces nor the <tt style="font-weight:bold"><span style="color:#cc4020">force</span></tt> syntax which acts as the elimination form for suspension types <tt style="font-weight:bold"><span style="color:#0000ff">U <span style="color:#cc4020">typ</span></span></tt>.) Basically, this allows me to write ML instead of monadic Haskell; you can imagine a transformation that elaborates the latter example into the former.</p>
<p>As an aside - do remember that, at the cost of potential confusion, I modified Bauer's original Levy implementation to make <tt style="font-weight:bold"><span style="color:#cc4020">force</span></tt> a prefix operator that binds more tightly than application - <tt style="font-weight:bold"><span style="color:#cc4020">force <span style="color:#0000ff">g</span> <span style="color:#0000ff">z</span> <span style="color:#0000ff">y</span></span></tt> can be written with parentheses as <tt style="font-weight:bold"><span style="color:#cc4020">(((force <span style="color:#0000ff">g</span>) <span style="color:#0000ff">z</span>) <span style="color:#0000ff">y</span>)</span></tt>.</p>
-->
<p>I'll be presenting with an imaginary extension to Bauer's Levy language in this post.<sup><a href="#whatdoes1">1</a></sup> I've <a href="http://requestforlogic.blogspot.com/2011/08/embracing-and-extending-levy-language.html">mucked around Levy before</a>, of course, and it would probably help to review that previous post before reading this one. I want to make one addition to Levy before we begin making big, interesting ones. The derived form that I want to add - <tt style="font-weight:bold"><span style="color:#cc4020">e1 orelse e2</span></tt> - is computation code with type <tt style="font-weight:bold"><span style="color:#cc4020">F <span style="color:#0000ff">bool</span></span></tt> if <tt style="font-weight:bold"><span style="color:#cc4020">e1</span></tt> and <tt style="font-weight:bold"><span style="color:#cc4020">e1</span></tt> have type <tt style="font-weight:bold"><span style="color:#cc4020">F <span style="color:#0000ff">bool</span></span></tt> as well. This is definable as syntactic sugar, where <tt style="font-weight:bold"><span style="color:#0000ff">x</span></tt> is selected to not be free in <tt style="font-weight:bold"><span style="color:#cc4020">e2</span></tt>:
<pre style="font-weight:bold"> <span style="color:#cc4020">e1 to <span style="color:#0000ff">x</span> in
if <span style="color:#0000ff">x</span> then return <span style="color:#009900">true</span>
else e2</span></pre>
One other aside - do remember that, at the cost of potential confusion, I modified Bauer's original Levy implementation to make <tt style="font-weight:bold"><span style="color:#cc4020">force</span></tt> a prefix operator that binds more tightly than application - <tt style="font-weight:bold"><span style="color:#cc4020">force <span style="color:#0000ff">g</span> <span style="color:#0000ff">z</span> <span style="color:#0000ff">y</span></span></tt> can be written with parentheses as <tt style="font-weight:bold"><span style="color:#cc4020">(((force <span style="color:#0000ff">g</span>) <span style="color:#0000ff">z</span>) <span style="color:#0000ff">y</span>)</span></tt>.</p>
<h2>Positive types, positive recursive types</h2>
<p>The functional languages have the construction of positive types just about right: they're made up of sums and products. I used a special LLF-ish construction for datatypes in my exploration of Levy#, but the more traditional way of introducing datatypes is to say that they are recursive types <tt style="font-weight:bold"><span style="color:#0000ff">μt.T(t)</span></tt>. The recursive types we're used to thinking about are naturally interpreted as positive types, because they are data and are inspected by pattern matching.</p>
<p>There's a tendency in programming language design to shove positive recursive types together with <i>labeled sum types</i> to get a familiar datatype mechanism.<sup><a href="#whatdoes2"a>2</a></sup> I will go along with this tendency and merge labeled sums and recursive types, writing them as <tt style="font-weight:bold"><span style="color:#0000ff">μt.[<span style="color:#009900">L1</span>: T1(t), <span style="color:#009900">L2</span>: T2(t), ..., <span style="color:#009900">Ln</span>: Tn(t)]</span></tt>.</p>
<p>Here are datatype definitions for trees of ints and lists of ints:
<pre style="font-weight:bold"> <span style="color:#0000ff">type+ tree =
μtree.
[ <span style="color:#009900">Leaf</span>: int,
<span style="color:#009900">Node</span>: tree * tree ]</span>
<span style="color:#0000ff">type+ list =
μlist.
[ <span style="color:#009900">Nil</span>,
<span style="color:#009900">Cons</span>: int * list ]</span></pre>
Note from the definition of lists that we also allow types to have no arguments: it's possible to treat the definition of <tt style="font-weight:bold"><span style="color:#009900">Nil</span></tt> as syntactic sugar for
<tt style="font-weight:bold"><span style="color:#0000ff"><span style="color:#009900">Nil</span>: unit</span></tt>. The associated value is <tt style="font-weight:bold"><span style="color:#009900">Nil</span></tt>, which is syntactic sugar for <tt style="font-weight:bold"><span style="color:#0000ff"><span style="color:#009900">Nil ()</span></span></tt>.</p>
<p>There are infinitely many trees and lists as we've defined them. In fact, it's just a convention of how we have to write programs that we think of positive types as being <i>finite</i> sums. Even though we can't write it down as a datatype declaration, it's perfectly reasonable to think of a built-in type of infinite precision integers as being defined as follows:
<pre style="font-weight:bold"> <span style="color:#0000ff">type+ int =
μint.
[ <span style="color:#009900">0</span>,
<span style="color:#009900">1</span>,
<span style="color:#009900">~1</span>,
<span style="color:#009900">2</span>,
<span style="color:#009900">~2</span>,
<span style="color:#009900">3</span>,
... ]</span></pre>
The same goes for built-in strings and the like - think of built-in types being positive enumerations that were provided specially by the language since the language didn't give us the ability to write down the full declaration containing all possible constants.<p>
<h3>Powerful views</h3>
<p>A point that has not, in my humble opinion, been made simply enough is that the perspective of focusing says that we should think about integrating much more powerful case analysis mechanisms into our programming languages. I learned about this point from Licata, Zeilberger, and Harper's 2008 <a href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4557915">Focusing on Binding and Computation</a>, but their setting had enough other interesting stuff going on that it probably obscured this simple point.</p>
<p>Standard ML (and, to the best of my knowledge, all other functional languages circa-1997) only provides a limited form of case analysis - arbitrary finite views into the outermost structure of the datatype:
<pre style="font-weight:bold"> <span style="color:#cc4020">case <span style="color:#0000ff">x</span> of
| <span style="color:#009900">Leaf 9</span> => ...
| <span style="color:#009900">Leaf</span> <span style="color:#0000ff">y</span> => ...
| <span style="color:#009900">Node (Node (Leaf 5, <span style="color:#0000ff">y</span>), <span style="color:#0000ff">z</span></span>) => ...
| <span style="color:#009900">Node (<span style="color:#0000ff">y</span>, <span style="color:#0000ff">z</span>)</span> => ...</span></pre>
This limitation comes with a nice tradeoff, in that we can pretty effectively estimate how much work compiled code needs to do to handle a pattern match. However, the structure of focusing suggests that <i>any</i> mathematical function from a value's structure to computation is fair game. One well-studied modernization of pattern matching is views, which allows us to group computations in other ways. One use case would be allowing us to take an integer variable <tt style="font-weight:bold"><span style="color:#0000ff">x</span></tt> and say that it is even or odd:
<pre style="font-weight:bold"> <span style="color:#cc4020">case <span style="color:#0000ff">x</span> of
| <span style="color:#009900">0</span> => return <span style="color:#009900">true</span>
| <span style="color:#0000ff">x is even</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">~1</span> => return <span style="color:#009900">false</span>
| <span style="color:#0000ff">x is odd</span> => return <span style="color:#009900">true</span></span></pre>
The focusing-aware view of pattern matching suggests that what a pattern match is actually doing is defining a case <i>individually for each value structure</i> - if we could write a countably-infinite-sized program, then we could expand the view-based program above to the no-longer-view-based, but infinitely long, program below:
<pre style="font-weight:bold"> <span style="color:#cc4020">case <span style="color:#0000ff">x</span> of
| <span style="color:#009900">0</span> => return <span style="color:#009900">true</span>
| <span style="color:#009900">1</span> => return <span style="color:#009900">true</span>
| <span style="color:#009900">~1</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">2</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">~2</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">3</span> => return <span style="color:#009900">true</span>
| ...</span></pre>
So: the focusing perspective says that <i>any</i> effect-free (effects include nontermination!) mathematical function we can write from value structure to computations is fair game for case analysis; views are just one known way of doing this in an interesting way. In principle, however, we can consider much richer case-analysis machinery to be fair game. For instance, there is a mathematical function \(f\) from integer values <tt style="font-weight:bold"><span style="color:#0000ff">int</span></tt> to computation code of type <tt style="font-weight:bold"><span style="color:#cc4020">F <span style="color:#0000ff">bool</span></span></tt> with the variable <tt style="font-weight:bold"><span style="color:#0000ff">coll</span></tt> free:
<ul>
<li>If the integer is 1, the result is <tt style="font-weight:bold"><span style="color:#cc4020">return <span style="color:#009900">true</span></span></tt> </li>
<li>If the integer is less than 1, the result is <tt style="font-weight:bold"><span style="color:#cc4020">return <span style="color:#009900">false</span></span></tt> </li>
<li>Otherwise, if the integer is divisible by 2 and the result of dividing the integer by 2 is <tt style="font-weight:bold"><span style="color:#009900">i</span></tt>, then the result is <tt style="font-weight:bold"><span style="color:#cc4020">force <span style="color:#0000ff">coll</span> <span style="color:#009900">i</span></span></tt></li>
<li>Otherwise, let <tt style="font-weight:bold"><span style="color:#009900">j</span></tt> be 3 times the integer, plus one. Then the result is <tt style="font-weight:bold"><span style="color:#cc4020">force <span style="color:#0000ff">coll</span> <span style="color:#009900">j</span></span></tt></li></li>
</ul>
Given this mathematical function, we have the core case analysis at the heart of the <a href="http://en.wikipedia.org/wiki/Collatz_conjecture">Collatz function</a>. If we expand out this case analysis into an infinitely-long mapping as before, it would look like this:
<pre style="font-weight:bold"> <span style="color:#cc4020">rec <span style="color:#0000ff">coll</span>: <span style="color:#0000ff">int</span> -> F <span style="color:#0000ff">bool</span> is
fn <span style="color:#0000ff">x</span>: int =>
case <span style="color:#0000ff">x</span> of
| <span style="color:#009900">0</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">1</span> => return <span style="color:#009900">true</span>
| <span style="color:#009900">~1</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">2</span> => force <span style="color:#0000ff">coll</span> <span style="color:#009900">1</span>
| <span style="color:#009900">~2</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">3</span> => force <span style="color:#0000ff">coll</span> <span style="color:#009900">10</span>
| <span style="color:#009900">~3</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">4</span> => force <span style="color:#0000ff">coll</span> <span style="color:#009900">2</span>
| <span style="color:#009900">~4</span> => return <span style="color:#009900">false</span>
| <span style="color:#009900">5</span> => force <span style="color:#0000ff">coll</span> <span style="color:#009900">16</span>
| ...</span></pre>
The important question here is: what is the right way to think about how we write down total functions from value structures? Are views about right, or do we need something different, clearer/more/less powerful? I don't have a sense for the answers, but I'm pretty confident that's the right question.</p>
<h3>Views are only over the structure of values</h3>
<p>One critical caveat: I used the phrase <i>value structure</i> in the previous discussion repeatedly on purpose. Because computations can always contain effects, we <i>cannot</i> poke at the computations at all when we think about views - the value structure is just the part of the value that we can inspect without traversing a <tt style="font-weight:bold"><span style="color:#0000ff">thunk</span></tt>. Therefore, if we have this definition...
<pre style="font-weight:bold"> <span style="color:#0000ff">type+ somefn =
μsomefn.
[ <span style="color:#009900">BoolFn</span>: U <span style="color:#cc4020">(<span style="color:#0000ff">bool</span> -> F <span style="color:#0000ff">bool</span>)</span>,
<span style="color:#009900">IntFn</span>: U <span style="color:#cc4020">(<span style="color:#0000ff">int</span> -> F <span style="color:#0000ff">int</span>)</span> ]</span></pre>
...then a case analysis on a value of type <tt style="font-weight:bold"><span style="color:#0000ff">somefn</span></tt> can have at most two branches - one for the <tt style="font-weight:bold"><span style="color:#009900">BoolFn</span></tt> case and one for the <tt style="font-weight:bold"><span style="color:#009900">IntFn</span></tt> case. We can't pattern-match into computations, so even though it would be reasonable to say that there are many, many values of type <tt style="font-weight:bold"><span style="color:#0000ff">somefn</span></tt>, there are only two value <i>structures</i> that can be inspected by case analysis.</p>
<h2>Negative types, negative recursive types</h2>
<p>In call-by-push-value, the computation types (a.k.a. the negative types of polarized logic/focusing) are defined by the way we use them. Computation code of computation type does stuff in response to being used - a function does stuff in response to getting an argument, and so the most well-understood negative type is implication.</p>
<p>Conjunction (the product or pair type) is a bit slippery, says the CBPV/focusing methodology. We've already used conjunction as a positive type (<tt style="font-weight:bold"><span style="color:#0000ff">t1 * t2</span></tt>), but we can also think of conjunction as being a negative type. This is because it's equally valid to think of a pair being positively defined by its construction (give me two values, that's a pair) and negatively as a suspended computation that awaits you asking it for its first or second component.</p>
<p>CPBV/focusing makes us choose: we can have either type of conjunction, or we can have both of them, but they are different. It will be convenient if we keep the normal tuples as positive types, and associate negative types with <i>records</i> - that is, with named products. Records are defined by how you <i>project</i> from them, and just as we tie labeled sums in with positive recursive types, we'll tie labeled products in with negative recursive types.</p>
<p>Negative recursive types are naturally codata in the same way that positive recursive types are naturally data: the canonical example is the infinite stream.
<pre style="font-weight:bold"> <span style="color:#cc4020">type- stream =
μstream.
{ <span style="color:#773300">head</span>: F <span style="color:#0000ff">int</span>,
<span style="color:#773300">tail</span>: stream }</span></pre>
It's natural to define particular streams with fixedpoints:
<pre style="font-weight:bold"> <span style="color:#000000">val <span style="color:#0000ff">Ones</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">rec <span style="color:#0000ff">this</span>: stream is
{ <span style="color:#773300">head</span> = return <span style="color:#009900">1</span>,
<span style="color:#773300">tail</span> = force <span style="color:#0000ff">this</span> }</span></span>
<span style="color:#000000">val <span style="color:#0000ff">CountUp</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">rec <span style="color:#0000ff">this</span>: <span style="color:#0000ff">int</span> -> stream is
fn <span style="color:#0000ff">x</span>: <span style="color:#0000ff">int</span> =>
{ <span style="color:#773300">head</span> = return <span style="color:#0000ff">x</span>,
<span style="color:#773300">tail</span> =
force <span style="color:#0000ff">plus</span> <span style="color:#0000ff">x</span> <span style="color:#009900">1</span> to <span style="color:#0000ff">y</span> in
force <span style="color:#0000ff">this</span> <span style="color:#0000ff">y</span> }</span></span></pre>
Say I wanted to get the fourth element of an infinite stream <tt style="font-weight:bold"><span style="color:#0000ff">str</span></tt> of type <tt style="font-weight:bold"><span style="color:#0000ff">U <span style="color:#cc4020">stream</span></span></tt>. The ML-ish way of projecting from records would write this as <tt style="font-weight:bold"><span style="color:#cc4020"><span style="color:#773300">#tail</span> (<span style="color:#773300">#head</span> (<span style="color:#773300">#head</span> (<span style="color:#773300">#head</span> (force <span style="color:#0000ff">str</span>))))</span></tt>, but I will pointedly not use that notation in favor of a different record notation: <tt style="font-weight:bold"><span style="color:#cc4020">(force <span style="color:#0000ff">str</span>)<span style="color:#773300">.head</span><span style="color:#773300">.head</span><span style="color:#773300">.head</span><span style="color:#773300">.tail</span></span></tt>. It nests better, and works better in the case that record elements are functions. (Because functions are negative types, like records, it is very natural to have functions be the arguments to records.)
</p>
<h3>Negative recursive types as Cook's objects</h3>
<p>Here's a bit of a case study to conclude. The entities that William Cook names "Objects" in his paper <a href="http://dl.acm.org/citation.cfm?id=1640133">On Understanding Data Abstraction, Revisited</a> are recognizably negative recursive types in the sense above.<sup><a href="#whatdoes3">3</a></sup> Cook's examples can be coded in Standard ML (see <a href="https://gist.github.com/06f79e05d422fb832dd3">here</a>), but the language gets in the way of this encoding in a number of places.<sup><a href="#whatdoes4">4</a></sup> To see what this would look like in a language with better support, I'll encode the examples from Cook's paper in my imaginary CBPV implementation. The recursive negative type looks much like a stream, and encodes the interface for these set objects.
<pre style="font-weight:bold"> <span style="color:#cc4020">type- iset =
μiset.
{ <span style="color:#773300">isEmpty</span>: F <span style="color:#0000ff">bool</span>,
<span style="color:#773300">contains</span>: <span style="color:#0000ff">int</span> -> F <span style="color:#0000ff">bool</span>,
<span style="color:#773300">insert</span>: <span style="color:#0000ff">int</span> -> iset,
<span style="color:#773300">union</span>: <span style="color:#0000ff">U <span style="color:#cc4020">iset</span></span> -> iset }</span></pre>
One difficulty: the Insert and Union operations that Cook uses involve mutual recursion. I don't have mutual recursion, so rather than pretending to have it (I've abused my imaginary language extension enough already) I'll code it up using records. The record type <tt style="font-weight:bold"><span style="color:#cc4020">insertunion</span></tt> I define isn't actually recursive, it's just a record - this is analogous to using ML's datatype mechanism (which, as we've discussed, makes positive recursive types) to define a simple enumeration.
<pre style="font-weight:bold"> <span style="color:#000000"><span style="color:#cc4020">type- insertunion =
μinsertunion.
{ <span style="color:#773300">Insert</span>: <span style="color:#0000ff">U</span> iset <span style="color:#0000ff">* int</span> -> iset,
<span style="color:#773300">Union</span>: <span style="color:#0000ff">U</span> iset <span style="color:#0000ff">* U</span> iset -> iset }</span>
val <span style="color:#0000ff">InsertUnion</span> = <span style="color:#0000ff">thunk</span>
<span style="color:#cc4020">rec <span style="color:#0000ff">x</span>: insertunion is
{ <span style="color:#773300">Insert</span> = fn <span style="color:#0000ff">(s, n)</span> =>
(force <span style="color:#0000ff">s</span>)<span style="color:#773300">.contains</span> <span style="color:#0000ff">n</span> to <span style="color:#0000ff">b</span> in
if <span style="color:#0000ff">b</span> then force <span style="color:#0000ff">s</span> else
rec <span style="color:#0000ff">self</span>: iset is
{ <span style="color:#773300">isEmpty</span> = return <span style="color:#009900">false</span>,
<span style="color:#773300">contains</span> = fn <span style="color:#0000ff">i</span> => (force <span style="color:#0000ff">equal i n</span>) orelse (force <span style="color:#0000ff">s</span>)<span style="color:#773300">.contains</span> <span style="color:#0000ff">i</span>,
<span style="color:#773300">insert</span> = fn <span style="color:#0000ff">i</span> => (force <span style="color:#0000ff">x</span>)<span style="color:#773300">.Insert</span> <span style="color:#0000ff">(self, i)</span>,
<span style="color:#773300">union</span> = fn <span style="color:#0000ff">s'</span> => (force <span style="color:#0000ff">x</span>)<span style="color:#773300">.Union</span> <span style="color:#0000ff">(self, s')</span> },
<span style="color:#773300">Union</span> = fn <span style="color:#0000ff">(s1, s2)</span> =>
rec <span style="color:#0000ff">self</span>: iset is
{ <span style="color:#773300">isEmpty</span> = (force <span style="color:#0000ff">s1</span>)<span style="color:#773300">.isEmpty</span> orelse (force <span style="color:#0000ff">s2</span>)<span style="color:#773300">.isEmpty</span>,
<span style="color:#773300">contains</span> = fn <span style="color:#0000ff">i</span> => (force <span style="color:#0000ff">s1</span>)<span style="color:#773300">.contains</span> <span style="color:#0000ff">i</span> orelse (force <span style="color:#0000ff">s2</span>)<span style="color:#773300">.contains</span> <span style="color:#0000ff">i</span>,
<span style="color:#773300">insert</span> = fn <span style="color:#0000ff">i</span> => (force <span style="color:#0000ff">x</span>)<span style="color:#773300">.Insert</span> <span style="color:#0000ff">(self, i)</span>,
<span style="color:#773300">union</span> = fn <span style="color:#0000ff">s'</span> => (force <span style="color:#0000ff">x</span>)<span style="color:#773300">.Union</span> <span style="color:#0000ff">(self, s')</span> } }</span>
val <span style="color:#0000ff">Insert</span>: <span style="color:#0000ff">U</span> <span style="color:#cc4020">(<span style="color:#0000ff">U <span style="color:#cc4020">iset</span> * int</span> -> iset)</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">((force <span style="color:#0000ff">InsertUnion</span>)<span style="color:#773300">.Insert</span>)</span>
val <span style="color:#0000ff">Union</span>: <span style="color:#0000ff">U</span> <span style="color:#cc4020">(<span style="color:#0000ff">U <span style="color:#cc4020">iset</span> * U <span style="color:#cc4020">iset</span></span> -> iset)</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">((force <span style="color:#0000ff">InsertUnion</span>)<span style="color:#773300">.Union</span>)</span></span></pre>
We've defined union of sets and insertion into a set, but we haven't actually defined any sets yet! Once we give the implementation of an empty set, however, we can manipulate these sets with a Java-esque method invocation style:
<pre style="font-weight:bold"> <span style="color:#000000">val <span style="color:#0000ff">Empty</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">rec <span style="color:#0000ff">self</span>: iset is
{ <span style="color:#773300">isEmpty</span> = return <span style="color:#009900">true</span>,
<span style="color:#773300">contains</span> = fn <span style="color:#0000ff">x</span> => return <span style="color:#009900">false</span>,
<span style="color:#773300">insert</span> = fn <span style="color:#0000ff">i</span> => force <span style="color:#0000ff">Insert (self, i)</span>,
<span style="color:#773300">union</span> = fn <span style="color:#0000ff">s'</span> => force <span style="color:#0000ff">s'</span> }</span>
val <span style="color:#0000ff">JustOne</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">(force <span style="color:#0000ff">Empty</span>)<span style="color:#773300">.insert</span> <span style="color:#009900">1</span></span>
<span style="color:#cc4020">(force <span style="color:#0000ff">Empty</span>)</span><span style="color:#773300">.insert</span><span style="color:#0000ff">(<span style="color:#009900">3</span>)</span><span style="color:#773300">.union</span><span style="color:#0000ff">(JustOne)</span><span style="color:#773300">.insert</span><span style="color:#0000ff">(<span style="color:#009900">5</span>)</span><span style="color:#773300">.contains</span><span style="color:#0000ff">(<span style="color:#009900">4</span>)</span></span></pre>
As Cook noted, it's also very natural to talk about infinite sets in this style, such as the set of all even numbers and the set of all integers:
<pre style="font-weight:bold"> <span style="color:#000000">val <span style="color:#0000ff">Evens</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">rec <span style="color:#0000ff">self</span>: iset is
{ <span style="color:#773300">isEmpty</span> = return <span style="color:#009900">false</span>,
<span style="color:#773300">contains</span> = fn <span style="color:#0000ff">i</span> =>
force <span style="color:#0000ff">mod i <span style="color:#009900">2</span></span> to <span style="color:#0000ff">x</span> in
(case <span style="color:#0000ff">x</span> of
| <span style="color:#009900">0</span> => return <span style="color:#009900">true</span>
| <span style="color:#0000ff">_</span> => return <span style="color:#009900">false</span>),
<span style="color:#773300">insert</span> = fn <span style="color:#0000ff">i</span> => force <span style="color:#0000ff">Insert (self, i)</span>,
<span style="color:#773300">union</span> = fn <span style="color:#0000ff">s'</span> => force <span style="color:#0000ff">Union (self, s')</span> }</span>
val <span style="color:#0000ff">Full</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">rec <span style="color:#0000ff">self</span>: iset is
{ <span style="color:#773300">isEmpty</span> = return <span style="color:#009900">false</span>,
<span style="color:#773300">contains</span> = fn <span style="color:#0000ff">i</span> => return <span style="color:#009900">true</span>,
<span style="color:#773300">insert</span> = fn <span style="color:#0000ff">i</span> => force <span style="color:#0000ff">self</span>,
<span style="color:#773300">union</span> = fn <span style="color:#0000ff">s'</span> => force <span style="color:#0000ff">self</span> }</span></span></pre>
</p>
<hr>
<p><sup><a name="whatdoes1"></a>1</sup> If anyone is interested in helping me implement a toy language along these lines, I'm all ears. Also, the disclaimer that since I don't have an implementation there are surely bugs bugs bugs everywhere is strongly in effect.</p>
<p><sup><a name="whatdoes2"></a>2</sup> As a practical matter, this makes it easier for the language to know where to put implicit roll/unroll annotations, so that programmers doesn't have to write these explicitly, which would be a pain.</p>
<p><sup><a name="whatdoes3"></a>3</sup> Cook does give the disclaimer that this definition is "recognized as valid by experts in the field, although there certainly are other valid models." I carefully call these things "Cook objects" rather than "objects" - my claim is that negative recursive types correspond to what William Cook names objects, not that they correspond to what <i>you</i> call objects. Note, furthermore, that I could be wrong about what William Cook calls objects! I only have his paper to go on and I have been known to be a few cookies short of a haystack.</p>
<p><sup><a name="whatdoes4"></a>4</sup> There are two issues with the encoding in SML. The most basic is that we have to encode the recursive types using SML's recursive type mechanism, which is biased towards the positive encoding of recursive types as finite sums. The other issue is that the way SML ties together (mutual) recursion with closures gets in the way - the standard way of writing a "thunk" as a function from unit is all over the place, and it wan't possible to define EmptySet so that it was mutually recursive with the definition of Insert and Union as a result. So, I'm certainly not arguing that SML facilities programming in (what Cook calls) an object-oriented style in a strong sense - it's unnecessarily painful to code in that style - but the minimal semantic building blocks of what Cook presented and named Object are present and accounted for. [Update:] As gasche discusses in the comments, Ocaml's records, and the way it deals with laziness, make it quite a bit better at <a href="https://gitorious.org/gasche-snippets/object-encoding-example/blobs/master/iset_lazy.ml">encoding the example</a>, but it's still not ideal.</p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com13tag:blogger.com,1999:blog-827419514217130218.post-26506104035127497012012-01-12T08:46:00.001-05:002012-03-05T08:58:05.973-05:00Structural focalization updated<p>I've uploaded to both <a href="http://arxiv.org/abs/1109.6273v2">ArXiV</a> and <a href="http://www.cs.cmu.edu/~rjsimmon/drafts/focus.pdf">my webpage</a> a significantly revised draft of the paper <i>Structural focalization</i>, which I've <a href="http://requestforlogic.blogspot.com/2011/09/my-new-focalization-technique-is.html">spoken about here before</a>. Feedback is welcome!</p>
<p>One of the points I make about the structural focalization technique is that, because it is all so nicely structurally inductive, it can be <a href="http://twelf.org/wiki/Focusing">formalized in Twelf</a>. As part of a separate project, I've now also repeated the whole structural focalization development in Agda! The code is <a href="https://github.com/robsimmons/agda-lib/tree/focalization">available from GitHub</a>. While a structural focalization proof has some more moving parts than a simple cut-and-identity proof, it also has one significant advantage over every Agda proof of cut admissibility that I'm aware of: it requires no extra structural metrics beyond normal structural induction! (My favorite structural metric is the <a href="http://requestforlogic.blogspot.com/2010/11/totally-nameless-representation.html">totally nameless representation</a>, but there are other ways of threading that needle, including, presumably, these "sized types" that everyone seems to talk about.)</p>
<p>In regular, natural-deduction substitution, you can get away without structural metrics by proving the statement that if \(\Gamma \vdash A\) and \(\Gamma, A, \Gamma' \vdash C\) then \(\Gamma, \Gamma' \vdash C\); the extra "slack" given by \(\Gamma'\) means that you operate by structural induction on the second given derivation without ever needing to apply weakening or exchange. Most cut-elimination proofs are structured in such a way that you have to apply left commutative and right commutative cuts on <i>both</i> of the given derivations, making this process tricky at the best; I've never gotten it to work at all, but you might be able to do something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A, \Gamma'' \longrightarrow C\) then \(\Gamma, \Gamma', \Gamma'' \longrightarrow C\)." If someone can make this work let me know!</p>
<p>A focused sequent calculus, on the other hand, has three separate phases of substitution. The first phase is principal substitution, where the type gets smaller and you can do whatever you want to the derivations, including weakening them. The second phase is rightist substitution, which acts much like natural-deduction substitution, and where you can similarly get away with adding "slack" to the second derivation. The third phase is leftist substitution, and you can get by in this phase by adding "slack" to the first derivation: the leftist cases read something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A \longrightarrow C\) then \(\Gamma, \Gamma' \longrightarrow C\)."</p>
<p>In <i>Structural focalization</i>, I note that the structural focalization technique could be seen as a really freaking complicated way of proving the cut and identity for an unfocused sequent calculus. But in Agda, there's a reason you might actually want to do things the "long way" - not only do you have something better when you finish (a focalization result), but you get cut and identity without needing an annoying structural metric.</p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-24661492832936092212012-01-06T13:27:00.000-05:002013-01-15T21:48:59.957-05:00Response from ACM's Scott Delman<p>In the comments to my last post (<a href="http://requestforlogic.blogspot.com/2012/01/why-does-acm-act-against-interests-of.html">"Why does the ACM act against the interests of scholars?"</a>) ACM's Director of Group Publishing, Scott Delman, left a multiple-comment response. It's a response both to the views I expressed and to the views of others that I summarized. He agreed to have his comments posted as a post here; I'll leave my own thoughts for a separate post or the comments.</p>
<blockquote>
<p>Ok, here's the other side of this, which I feel compelled to throw out there after reading Rob's post and a few of the related comments.</p>
<p>Like most things in life, things are not always as black and white as some would lead us to believe. In this case, I think there is a basic misunderstanding of the ACM and the AAP (which is incidentally an organization that does a great deal of good work on behalf of both publishers and the scientific community).</p>
<p>Let's start with the ACM....which is a non-profit organization founded in 1947 by members of the computing community with the primary mission of advancing the field of computing. The Association is organized as a 501(c)3 corporation with daily operations run by a small staff of approximately 75 individuals who ultimately take their direction from a volunteer leadership of hundreds of dedicated scientists, scholars, educators, practitioners, and students who graciously donate a significant amount of their time to direct the Association forward in a way that benefits the computing community as a whole. It is important to point this out, because there is an implication in the original post that the ACM is an entity that is in some way acting against the scholarly community, when in fact the ACM is an organization that is literally run by the scholarly community. </p>
<p>Keeping this in mind, we are either left with a situation in which the scholarly community is either acting against itself by the policies it sets and supports (such as ACM's copyright policy and ACM's subscription model) or something else is going on here. Since it doesn't seem logical or even practical that the top-decision makers at ACM (such as the ACM Publications Board of Volunteers or the ACM Executive Committee of Volunteers, who oversee all major strategic decisions of the Association) would support policies that actively work against the interests of their own community, I think it is more reasonable to suggest that what is going on here is that the issues are not as cut and dry or as simplified as some advocates of "immediate and unrestricted" open access to all scholarly literature would lead us to believe.</p>
<p>Whenever I discuss the topic of open access with colleagues and friends, I think it is useful to try to imagine what the world would look like if the US Federal Government or other Foreign Governments decided to pass legislation that required all scholarly material that is in some way supported by public funding be made instantly open and freely available to the world without any paywalls of any sort. Well, as ACM's publisher and someone who is intimately aware of the tangible costs of publishing and disseminating high quality scholarly literature, I can tell you without a shadow of a doubt that the end result of this sort of legislation would be catastrophic for the scientific community and scholarly publishers alike. If in a blink of an eye, organizations like ACM were required to simply open up our archive of articles (the ACM DL) without the ability to recoup the costs of publishing and disseminating those articles (or all of the technically sophisticated services built around that content inside the ACM DL), ACM would cease to be the sustainable organization it is today and would eventually shutter its doors at some point in the future, instead of continuing to be the sustainable force for good that it is today. If this sounds like PR-dribble, I apologize, but I really do believe this!</p>
<p>What's more, the senior volunteers who are most familiar with ACM's activities and who sit on ACM's various committees and boards recognize and understand the tradeoffs that are necessary to maintain a sustainable organization. Over the past few years, I have participated in meetings with our Publications Board, which is the governing body for publications related strategy and decisions at ACM, where the issues of open access and alternative business models have been repeatedly discussed, and when all of the facts have been taken into consideration it has been overwhelmingly clear to these members of the community that ACM's approach is in the best longterm interests of the scholarly community. In fact, the ACM Author-Izer service, which is written about in the above post, was conceptualized at one of these meetings as the result of an in-depth discussion about how to balance the legitimate need of our authors to make the "archival versions" of their articles openly available while at the same time preserving the revenue stream that ACM relies heavily on to do its good work. ACM's pre-existing copyright policy already addressed the issue of posting "accepted versions" of an author's work, but ACM's volunteers decided that it was even more beneficial for the community if the "archival versions" could be made available from the author's site using the "Author-Izer" perpetual link. In general, while Author-Izer is still relatively new, the initial responses have been extremely positive and there is widespread recognition (including Rob's above) that this is a step in the right direction....</p>
<p>Let me briefly address the "opposing views" raised in Rob's post. First, in an instance where an author graduates, moves, or retires, it is always possible for the initial link to be replaced by a more up-to-date link. The ability to manage the URL that hosts the link is in the hands of the author, so I don't see a significant issue here and at the very least the effort on behalf of the author is no greater (and perhaps significantly less) than it would be to move their vitae or "pre-published" articles to a new website. What's more, ACM has simplified this process for authors and eliminated the confusion that is caused by having "multiple versions" of articles available on multiple websites by creating a centralized place (their ACM Author's Page, which includes all of their ACM and non-ACM publications) from which authors can update their Author-Izer links. By hosting the archival version of the article on a single and "sustainable" site, we strongly believe this is a better solution for the community.</p>
<p>In relation to argument from Russell O'Connor, I reject the plausibility or even the possibility that the ACM might "go evil" for the reasons I've outlined above. Since ACM ultimately carries out the well thought out wishes of the community itself since the decision makers are the very members of the community who are impacted by those decisions, it is just not possible for such a scenario to occur. Bankrupt is another story, since it is always impossible to predict how an organization's finances will be managed in the future, even though for the record it is exactly the kind of decision making I've mentioned above that currently keeps the ACM is a very strong position. Nevertheless, contingencies are in place for this unlikely scenario, as it relates to ACM's publications and all articles in the ACM Digital Library. Several years ago, ACM established partnerships with two very well established organizations (CLOCKSS & Portico) to ensure that ACM's publications would be preserved and made available to the scientific community (at no cost) in the unlikely event that ACM ceased to exist. <i>[Rob's note: <a href="http://librarians.acm.org/acm-announces-initiative-long-term-preservation-content-its-digital-library-computing-resources">here's a press release about that</a>.]</i> Both organizations take different approaches to longterm digital preservation, but both are non-profits that exist for the sole purpose of maintaining a longterm perpetual archive for the scholarly community and nearly all major scientific publishers participate in one or both of these initiatives. ACM participates in both to provide an even higher level of redundancy than most other publishers. So, it is not clear what would happen to Author-Izer in the event of this doom-day scenario, but what is for certain is that ACM's archive would be made available to the scholarly community in any event.</p>
<p>Lastly, it is worth noting that the AAP is one of the publishing industries' primary advocates and they do an enormous amount of good work. Rather than deriding this organization that supports and protects the interests of over 300 well established publishers, including ACM, I would suggest that we focus on the spirit of what the Research Works Act represents, which is to limit the ability of the federal government to mandate decisions that would almost certainly have a longterm catastrophic impact on an industry that partners with and supports (and in our case is one and the same) the scientific community.</p>
<p>Respectfully,</p>
<p>Scott Delman<br/>
Director of Group Publishing<br/>
Assoc. Computing Machinery</p>
</blockquote>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com16tag:blogger.com,1999:blog-827419514217130218.post-43696662454101724092012-01-05T13:08:00.000-05:002012-01-07T17:56:08.360-05:00Why does the ACM act against the interests of scholars?<p><b>[Updated Jan 6, Jan 7]</b> Some stuff has been happening! I'm delighted by two developments. First, ACM's Director of Group Publishing, Scott Delman, wrote a series of comments that is now one big post: <a href="http://requestforlogic.blogspot.com/2012/01/response-from-acms-scott-delman.html">Response from ACM's Scott Delman</a>. Second, I've observed that many other people came to the same conclusion I did - that it's time for our professional organizations to leave the Association of American Publishers. The reason I brought up ACM Author-izer was to argue that Author-izer makes sense only insofar as the CS community trusts their professional organization; I remain of the view that membership in the AAP is <i>incompatible with this trust</i>. Here's <a href="http://cameronneylon.net/blog/update-on-publishers-and-sopa-time-for-scholarly-publishers-to-disavow-the-aap/">Cameron Neylon saying that a bit more forcefully</a>, and <a href="http://scienceblogs.com/confessions/2012/01/scholarly_societies_its_time_t.php">here's John Dupuis</a>, who is also compiling a list of <a href="http://scienceblogs.com/confessions/2012/01/around_the_web_some_posts_on_t_1.php">all the things related to RWA</a>. (Did I mention the <a href="http://judiciary.house.gov/issues/Rogue%20Websites/List%20of%20SOPA%20Supporters.pdf">AAP also supports SOPA</a>? Yep, <i>awesome.</i>)</p>
<hr/>
<p><i>I've got two logic posts queued up in the To Do list. But, dear the internet, we need to talk about the ACM. TL;DR is bold and italics at the bottom.</i></p>
<p>My friend Glenn Willen tweeted something about the <a href="http://www.publishers.org/press/56/">Research Works Act</a> last night. [Update: you can read the very short bill <a href="http://thomas.loc.gov/cgi-bin/query/z?c112:H.R.3699:">here</a>] Basically, it would (among other things) require that the government can't demand that publishers of federally-funded research make their research available to the public. You should really read the press release; it just a wonderful example of the "dripping in PR" genre of literature.</p>
<p>This is not shocking. Awful legislation gets introduced all the time with names ("Research Works Act") that do the opposite of what their title suggests (preventing research from working and acting, wildly attempting to maintain an ultimately unsustainable status quo). Frankly, I expect publishers to behave this way, and I expect there to be the usual variety of opinions about it. But then I ran through the members of the Association of American Publishers, the group which is cheering this legislation that the (presumably) they wrote, hoping against hope. I was unsurprised but a bit sickened by what I saw: the Association for Computing Machinery is a <a href="http://publishers.org/members/">member of the AAP</a>.</p>
<p>I like the ACM, I am proud of my membership in the ACM and ACM SIGPLAN, the <b>S</b>pecial <b>I</b>nterest <b>G</b>roup on <b>P</b>rogramming <b>Lan</b>guages. I personally think that the ACM's republication policies have been pretty reasonable during the time I've inhabited this academic world. I'm also proud of my involvement with ACM through their <a href="http://xrds.acm.org/">student magazine, XRDS</a>. I write profiles of fascinating people, all of which are available <a href="http://www.cs.cmu.edu/~rjsimmon/xrds.html">on my personal webpage</a>, for free, through the ACM Author-izer service. </p>
<h3>Let's talk about that Author-izer</h3>
<p>When I publish anything through the ACM, they own the copyright. This is totally fine-by-me for things I write for XRDS (when I worked for the <a href="http://www.dailyprincetonian.com/">Daily Princetonian</a> in college they owned copyright on my work for them as well). In my mind, it's a little more complicated when I publish stuff I wrote in an ACM conference proceedings. I want to make sure people have access to that research-ey material - my reputation and career opportunities depend on people finding, reading, and liking my work, but in ACM's Digital Library it's behind a "paywall," accessible only to ACM members and people on university networks. The ACM (unlike IEEE) provides a couple of different-sized hoops that you can jump through to provide free access to your work from your personal home page; Author-izer is the newest of these.</p>
<p>On a technical level, ACM Author-izer lets you, the author of a work that the ACM now has copyright to, bless a particular URL on the internet (presumably your personal home page). The ACM then gives you a special link to their Digital Library - if you're coming from the blessed URL to the special link, you get access to the research. It sounds a little goofy but it works for me in practice and I'm cautiously pleased with it. (Here's <a href="https://freedom-to-tinker.com/blog/appel/acm-opens-another-hole-paywall">Andrew Appel talking about Author-izer</a> if you'd like a concurring opinion.)</p>
<p>But there's <a href="http://www.researchwithoutwalls.org/About.aspx">another view</a> that Author-izer is a step backwards - because moving a web page (upon graduation or retirement) breaks the functionality of Author-izer links, ACM gets, in the long run, more exclusive content than if people were posting semi-definitive versions of papers on their web page. This is not a crazy concern, but I feel like lots of universities archive alumni's pages in-place, so I also don't feel too worried about it.</p>
<p>There's another argument I've read (UPDATE: <a href="http://r6.ca/blog/20110930T012533Z.html">from Russell O’Connor</a>, I'd forgotten the link but Joshua Dunfeld reminded me in the comments). It's plausible, more insidious, and more long-term. The ACM might "go evil" in some way, sure, but even positing that the ACM is and will remain reasonably virtuous, what if the ACM goes bankrupt? In the bankruptcy proceedings, some copyright trolls get the rights to everything in the digital library, immediately shut down Author-izer, and start wreaking havoc on academia (threatening lawsuits and demanding money who posted ACM-published works to their webpage) because they're copyright trolls and that's how they roll. A lot of people are violating the letter of their agreements when they post work to their web pages - you're not allowed to post the ACM's PDF, and in fact my reading of the agreement is that you have to change the ACM copyright notice in your version of the paper to a specific other thing; most people don't do this. Of course the ACM-provided LaTeX class doesn't support this, so you have to go diddling around with .cls flies to produce a PDF that looks like <a href="http://www.cs.cmu.edu/~rjsimmon/papers/simmons09linlogapx.pdf">this</a> - see the lower-left-hand corner of the first page. Because people are less likely to jump (correctly) through the "author's version" hoop, instead relying on Author-izer, in this hypothetical narrative the ACM's policies have, indeed, worked against the interests of ACM's members.</p>
<h3>What does this have to do with the "Research Works Act" breaking research?</h3>
<p>My view of Author-izer is that it requires a high level of trust: trust that the ACM will continue supporting authors, and that we'll be able to continue supporting the ACM (since if we don't or can't support the ACM, it will go bankrupt and be taken over by copyright trolls). I can overlook little things where the ACM is not acting in the interest of its members (why doesn't the standard .cls make it easy to make an authors version?) because the world isn't perfect.</p>
<p>Furthermore, with experiments like Author-izer, I believe that ACM has demonstrated that it's trying to do the right thing, as opposed to IEEE, which doesn't give authors hoops to jump through to legally post their work to their webpages. (You should read IEEE's hilariously awful responses to <a href="http://www.crypto.com/blog/copywrongs/">Matt Blaze</a> on this issue. Blaze, I should add, sees much less difference between ACM and IEEE than I do.)</p>
<p>However, the "Research Works Act" makes it clear that ACM's membership in the Association of American Publishers is an <i>egregious and unacceptable</i> instance of working against the interest of scholars and ACM members. <b><i>We should be thinking about how to demand that our professional organization, the Association for Computing Machinery, do two things: 1) withdraw from the Association of American Publishers 2) take the clear position that the so-called "Research Works Act" is an unacceptable piece of legislation that is not supported by the computer science community.</i></b></p>
<p>We should do this even though (I suspect) the primary target of this ridiculous act is medical science. (At the present time, the NIH admirably puts a higher priority on open dissemination of research than the NSF.)</p>
<hr>
<p><i>P.S. added January 7: The title of this post is intentionally re-asking the question Blaze asks in his aforementioned post, "Why do IEEE and ACM act against the interests of scholars?" I am focusing on the ACM simply because I care more about the ACM. The IEEE is also a member organization of the AAP.</i></p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com12tag:blogger.com,1999:blog-827419514217130218.post-32843896505626884812011-12-17T20:14:00.000-05:002012-03-05T23:36:03.935-05:00Notes on classical sequent calculi (1/2)<p>These are some notes I made to try to help me understand Noam's focused presentation of classical logic in <a href="http://www.pps.jussieu.fr/~noam/papers/polpol.pdf">Polarity and the Logic of Delimited Continuations</a>. I hope these notes coud be useful to others.</p>
<h2>Sequent presentations of classical logic</h2>
<p>I write, inevitably, from the perspective of an intuitionistic proof theorist, so let's start there. In intuitionistic logics, sequents have the form \(\Gamma \vdash A~\mathit{true}\), where the hypothetical context \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\). The whole sequent \(\Gamma \vdash A~\mathit{true}\) is read as "assuming the truth of all the things in \(\Gamma\), we know \(A~\mathit{true}\)." Let's look at a couple of ways of presenting sequent calculi for <i>classical</i> logic.</p>
<h3>Two-sided judgmental classical sequent calculi</h3>
<p>One way of presenting a classical sequent calculus is to give a two-sided sequent, \(\Gamma \vdash \Delta\). As before, \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\), but \(\Delta\) has the form \(B_1~\mathit{false}, \ldots, A_m~\mathit{false}\), and the whole sequent is read as "taken together, if all the things judged true in \(\Gamma\) are true and all the things judged false in \(\Delta\) are false, then there is a contradiction."</p>
<p>The language of propositions is \(A ::= P \mid \neg A \mid A \wedge B \mid A \vee B\) - uninterpreted atomic propositions \(P\), negation, conjunction ("and"), and disjunction ("or"). I'm leaving out truth \(\top\) and falsehood \(\bot\) because they're boring. The rules for these two-sided classical sequent calculi look like this:</p>
\[
\infer
{\Gamma, ~ P~\mathit{true} \vdash \Delta, ~ P~\mathit{false}}
{}
\]
\[
\infer
{\Gamma \vdash \Delta, ~ \neg A~\mathit{false}}
{\Gamma, ~ A~\mathit{true} \vdash \Delta}
\qquad
\infer
{\Gamma, ~ \neg A~\mathit{true} \vdash \Delta}
{\Gamma \vdash \Delta, ~ A~\mathit{false}}
\]
\[
\infer
{\Gamma \vdash \Delta, ~ A \wedge B~\mathit{false}}
{\Gamma \vdash \Delta, ~ A~\mathit{false}
&\Gamma \vdash \Delta, ~ B~\mathit{false}}
\qquad
\infer
{\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta}
{\Gamma, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \Delta}
\]
\[
\infer
{\Gamma \vdash \Delta, ~ A \vee B~\mathit{false}}
{\Gamma \vdash \Delta, ~ A~\mathit{false}, ~ B~\mathit{false}}
\qquad
\infer
{\Gamma, ~ A \vee B~\mathit{true} \vdash \Delta}
{\Gamma, ~ A~\mathit{true} \vdash \Delta
&\Gamma, ~ B~\mathit{true} \vdash \Delta}
\]
<p>Two asides. First, in presentations that do not emphasize the fact that \(A_i~\mathit{false}\) and \(B_j~\mathit{true}\) are <i>judgments</i> and not propositions, there is another reading of the two-sided sequent \[A_1,\ldots,A_n \vdash B_1,\ldots,B_m\] This interpretation is that the truth of <i>all</i> of the \(A_i\) implies the truth of <i>one</i> of the \(B_j\) - this reading suggests a reading of any intuitionistic sequent proof as a classical sequent proof with one conclusion. You should convince yourself that this interpretation is equivalent to the interpretation above (hint: it's just a mode of use of De Morgan's laws).</p>
<p>Second aside: your rules may differ. I'm using a style of presentation where every connective is broken down by a unique connective and, from the perspective of bottom-up proof search, it's never a mistake to apply any rule, because the conclusion implies all of the premises (a property called <i>invertibility</i>). The "true" (or left) rule for conjunction (that is, "and" or \(\wedge\)) and the "false" (or right) rule for disjunction (that is, "or" or \(\vee\)) both have a different, non-invertible presentation. In the case of conjunction, it's this pair of rules:
\[
\infer
{\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta}
{\Gamma, ~ A~\mathit{true} \vdash \Delta}
\qquad
\infer
{\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta}
{\Gamma, ~ B~\mathit{true} \vdash \Delta}
\]
You could "make a mistake" applying these rules in bottom-up proof search:
just because there is a proof of
\(\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta\) does not mean that there is a proof
of \(\Gamma, ~ A~\mathit{true} \vdash \Delta\).
</p>
<h3>One-sided judgmental sequent sequent calculi</h3>
<p>Of course, hypotheses are just hypotheses, there's no a priori reason why we need to separate the true ones and the false ones into separate contexts. Let's use a unified context and call it \(\Psi\).
\[\Psi ::= \cdot \mid \Psi, A~\mathit{true} \mid \Psi, A~\mathit{false}\]
Then, we can have the sequent form \(\Psi \vdash \#\), which we read as "all the assumptions in \(\Psi\) together imply a contradiction" - we pronounce \(\#\) as "contradiction." We'll need rewrite all of our rules:</p>
\[
\infer
{\Psi, ~ P~\mathit{true}, ~ P~\mathit{false} \vdash \#}
{}
\]
\[
\infer
{\Psi, ~ \neg A~\mathit{false} \vdash \#}
{\Psi, ~ A~\mathit{true} \vdash \#}
\qquad
\infer
{\Psi, ~ \neg A~\mathit{true} \vdash \#}
{\Psi, ~ A~\mathit{false} \vdash \#}
\]
\[
\infer
{\Psi, ~ A \wedge B~\mathit{false} \vdash \#}
{\Psi, ~ A~\mathit{false} \vdash \#
&\Psi, ~ B~\mathit{false} \vdash \#}
\qquad
\infer
{\Psi, ~ A \wedge B~\mathit{true} \vdash \#}
{\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#}
\]
\[
\infer
{\Psi, ~ A \vee B~\mathit{false} \vdash \#}
{\Psi, ~ A~\mathit{false}, ~ B~\mathit{false} \vdash \#}
\qquad
\infer
{\Psi, ~ A \vee B~\mathit{true} \vdash \#}
{\Psi, ~ A~\mathit{true} \vdash \#
&\Psi, ~ B~\mathit{true} \vdash \#}
\]
<p>Hopefully you'll agree that this is "obviously the same" as the first presentation.</p>
<h3>One-sided, truth-oriented sequent calculi</h3>
<p>But wait! The "false" rule for conjunction looks just like the "true" rule for disjunction, and the "true" rule for conjunction looks just like the "false" rules for disjunction. Can we simplify these rules?</p>
<p>The usual answer is that you can, indeed, do with fewer rules and without a false judgment at all. However, we need two twists to deal with the rules that involved both the true and false judgments. First, we need to let every atomic proposition come in two flavors, the "regular" flavor \(P\) and the "negated" flavor \(\overline{P}\). Then, the rule dealing with atomic propositions looks like this:
\[
\infer
{\Gamma, ~ P~\mathit{true}, ~ \overline{P}~\mathit{true} \vdash \#}
{}
\]
Second, instead of negation being a <i>proposition</i> \(\neg A\), we define a negation <i>function</i>, which I will write as \((A)^\bot\) to distinguish it from the propositional negation \(\neg A\). W. The negation function is defined as follows:
\[
\begin{align}
{(P)^\bot} = & \overline{P}\\
{(\overline{P})^\bot} = & P\\
{(A \wedge B)^\bot} = & {(A)^\bot} \vee {(B)^\bot}\\
{(A \vee B)^\bot} = & {(A)^\bot} \wedge {(B)^\bot}\\
\end{align}
\]
With this definition, we can eliminate the negation proposition altogether - the negation function just applies De Morgan laws all the way down to atomic propositions. We now get our sequent calculus for "half off" - there's no more official negation, and we don't need the false judgment at all anymore. We only need two more rules (for a total of three)!</p>
\[
\infer
{\Psi, ~ A \wedge B~\mathit{true} \vdash \#}
{\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#}
\qquad
\infer
{\Psi, ~ A \vee B~\mathit{true} \vdash \#}
{\Psi, ~ A~\mathit{true} \vdash \#
&\Psi, ~ B~\mathit{true} \vdash \#}
\]
<p>It would also be possible to play this game the other way around: gather everything on the right-hand side, bias the whole thing towards the "false" judgment, and basically
get the "other half" of the two-sided sequent calculi. This ability to play the game equally well either way is part of what people mean when they say that classical logic is "very symmetric."</p>
<p>However, given that it's all the same, why not reason about truth and not falsehood? I've never understand why classical linear logic (in particular) always seems to bias itself towards one-sided sequent calculi on the <i>right</i>. There are important differences in what it means to think like a classical linear logician and what it means to think like an intuitionistic linear logician, but I really think that it unnecessarily exacerbates this divide when we have to turn all of our \(\oplus\)es to \(\&\)s and \(\otimes\)es to pars in order to talk to one another.</p>
<h2>Polarized presentations of classical logic</h2>
<p>Now for the real purpose of this note: writing out the review of classical logic that Noam gives in "Polarity and the Logic of Delimited Continuations." This discussion is a synthesis of that presentation and a little bit of "On the unity of duality."</p>
<h3>Two for the price of two</h3>
<p>Fundamentally, the observation Noam is making is that the one-sided truth-oriented sequent calculus goes <i>too far</i> - really, there are two kinds of disjunction, and two kinds of conjunction, which is why it seemed like the original calculus seemed to have redundancies. The third system above (the one-sided, truth-oriented sequent calculus) made it look like we were getting our logic for "half-off" - but really that's because the first two judgmental presentations were defining <i>twice as many connectives</i> as appeared to the naked eye. (As an aside, if you study classical linear logic, you're forced into the same conclusion for different reasons.)</p>
<p>Jason Reed taught me that, if you have two different judgments in a logic, it's worth seeing what happens if you <i>syntactically differentiate</i> the things you judge to be true and the things you judge to be false. Let's go ahead and "guess the right answer" - I'm going to call the things we judge to be true <i>positive</i>, and that the things we judge to be false <i>negative</i>. <a href="http://en.wikipedia.org/wiki/Fringe_(TV_series)">There's more than one of everything!</a>
\[
\begin{align}
A^- = & \neg^- A^+ \mid P^- \mid A^- \wedge^- B^- \mid A^- \vee^- B^-\\
A^+ = & \neg^+ A^- \mid P^+ \mid A^+ \wedge^+ B^+ \mid A^+ \vee^+ B^+
\end{align}
\]
Here are a bunch of rules: note that the fact that the two negations change the polarity of the propositions; the rules make it evident that this is the right thing to do, because we have (for example) \(\neg^+ A^- ~\mathit{true}\) but \(A^- ~\mathit{false}\):</p>
\[
\infer
{\Psi, ~ \neg^- A^+~\mathit{false} \vdash \#}
{\Psi, ~ A^+~\mathit{true} \vdash \#}
\qquad
\infer
{\Psi, ~ \neg^+ A^-~\mathit{true} \vdash \#}
{\Psi, ~ A^-~\mathit{false} \vdash \#}
\]
\[
\infer
{\Psi, ~ A^- \wedge^- B^-~\mathit{false} \vdash \#}
{\Psi, ~ A^-~\mathit{false} \vdash \#
&\Psi, ~ B^-~\mathit{false} \vdash \#}
\qquad
\infer
{\Psi, ~ A^+ \wedge^+ B^+~\mathit{true} \vdash \#}
{\Psi, ~ A^+~\mathit{true}, ~ B^+~\mathit{true} \vdash \#}
\]
\[
\infer
{\Psi, ~ A^- \vee^- B^-~\mathit{false} \vdash \#}
{\Psi, ~ A^-~\mathit{false}, ~ B^-~\mathit{false} \vdash \#}
\qquad
\infer
{\Psi, ~ A^+ \vee^+ B^+~\mathit{true} \vdash \#}
{\Psi, ~ A^+~\mathit{true} \vdash \#
&\Psi, ~ B^+~\mathit{true} \vdash \#}
\]
<p>So, are we good? Well, no, not really. The problem is that the "+" or "-" stuck to an atomic proposition isn't an annotation or modifier the way the overbar was in the one-sided, truth-oriented sequent calculus above. \(P^+\) and \(P^-\) are <i>different</i> atomic propositions, and it wouldn't be right to given an inference rule that had, as its conclusion, \(\Psi, ~ P^+~\mathit{true}, ~ P^-\mathit{false}\). Why? Well, for now let's go with "because I said so." The argument I have for this point isn't bulletproof, and it has to do with the role of atomic propositions as stand-ins for other propositions.</p>
<p>However, if you accept my argument from authority, we are left no way to prove, or even to state, anything equivalent to the classical \(P \vee \neg P\) into polarized logic, since any way we try to polarize this formula will lead to \(P\) needing to be both positive and negative. We're going to need some way, different from negation, of including positive propositions in negative ones.</p>
<p>These inclusions of positive propositions into negative ones (and vice versa) are called <i>shifts</i> - \({\downarrow}A^-\) is a positive proposition and \({\uparrow}A^+\) is a negative proposition. We could just add these two rules and call it a day...
\[
\infer
{\Psi, ~ P^+~\mathit{true}, ~ {\uparrow}P^+~\mathit{false} \vdash \#}
{}
\qquad
\infer
{\Psi, ~ P^-~\mathit{false}, ~ {\downarrow}P^-~\mathit{true} \vdash \#}
{}
\]
...but this is hardly general: the rules above should be derivable; this rule should be derivable as well:
\[
\infer
{\Psi, ~ P^+~\mathit{true}, ~{\uparrow}(P^+ \vee^+ Q^+)~\mathit{false} \vdash \#}
{}
\]
All three of these derivable rules share a common property: in an on-paper proof, we would say that the contradiction is "trivial." The hypothesis \({\uparrow}P^+~\mathit{false}\) is trivial due to the fact that \(P^+\) is true by a different hypothesis, and because the truth of \(P^+\) allows us to trivially conclude that \(P^+ \vee^+ Q^+\) is true, \(P^+ \vee^+ Q^+\) is trivially contradictory as well.</p>
<p>This idea is encoded in two rules which capture proof-by-contradiction. One way we establish a contradiction is by showing that \(A^+\) is both false (by assumption) and trivial (by direct proof). The other way we establish a contradiction is by showing that \(A^-\) is both true (by assumption) and false (by direct proof of absurdity). These are embodied in the following two rules:</p>
\[
\infer
{\Psi \vdash \#}
{{\uparrow}A^+~\mathit{false} \in \Psi
&\Psi \vdash A^+~\mathit{trivial}}
\qquad
\infer
{\Psi \vdash \#}
{{\downarrow}A^-~\mathit{true} \in \Psi
&\Psi \vdash A^-~\mathit{absurd}}
\]
<p>Now, of course, we need to give a bunch more rules to describe how to prove positive propositions trivial and negative propositions absurd.</p>
\[
\infer
{\Psi, ~ P^+~\mathit{true} \vdash P^+~\mathit{trivial}}
{}
\qquad
\infer
{\Psi, ~ P^-~\mathit{false} \vdash P^-~\mathit{absurd}}
{}
\]
\[
\infer
{\Psi \vdash \neg^+ A^-~\mathit{trivial}}
{\Psi \vdash A^-~\mathit{absurd}}
\qquad
\infer
{\Psi \vdash \neg^- A^+~\mathit{absurd}}
{\Psi \vdash A^+~\mathit{trivial}}
\]
\[
\infer
{\Psi \vdash A^+ \wedge^+ B^+~\mathit{trivial}}
{\Psi \vdash A^+~\mathit{trivial}
&\Psi \vdash B^+~\mathit{trivial}}
\qquad
\infer
{\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}}
{\Psi \vdash A^+~\mathit{trivial}}
\qquad
\infer
{\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}}
{\Psi \vdash B^+~\mathit{trivial}}
\]
\[
\infer
{\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}}
{\Psi \vdash A^-~\mathit{absurd}}
\qquad
\infer
{\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}}
{\Psi \vdash B^-~\mathit{absurd}}
\qquad
\infer
{\Psi \vdash A^- \vee^- B^-~\mathit{absurd}}
{\Psi \vdash A^-~\mathit{absurd}
&\Psi \vdash B^-~\mathit{absurd}}
\]
<p>Even yet, we are not done! We need to deal with the shifts, which embody another form of proof-by-contradiction: to prove that \(A^-\) holds trivially, assume it's false and derive a contradiction.</p>
\[
\infer
{\Psi \vdash {\downarrow}A^-~\mathit{trivial}}
{\Psi, A^-~\mathit{false} \vdash \#}
\qquad
\infer
{\Psi \vdash {\uparrow}A^+~\mathit{absurd}}
{\Psi, A^+~\mathit{true} \vdash \#}
\]
<p>The thing that we've come up with by this process is what I've been calling a "weakly focused" version of classical logic. If we wanted to turn it into a "fully focused" presentation of classical logic, we'd only need to make one change: the first "proof by contradiction" rules, which we call "focusing" rules, would need to require that the context \(\Psi\) includes only judgments of the form \(P^+~\mathit{true}\), \({\downarrow}A^-~\mathit{true}\), \(P^-~\mathit{false}\), and \({\uparrow}A^+~\mathit{false}\). A context including only judgments of these four forms is called <i>stable</i>. To get full focusing, we would modify the "trivial focus" rule like this (a similar modification would be made to the "absurd focus" rule):</p>
\[
\infer
{\Psi \vdash \#}
{{\uparrow}A^+~\mathit{false} \in \Psi
&\Psi \vdash A^+~\mathit{trivial}
&\Psi~\mathit{stable}}
\]
<p>Thinking about the sequent calculus as a bottom-up proof search procedure, if we are looking for a proof of a non-stable sequent, we can use our original, invertible rules to break down the connectives in the contexts until we have only stable sequents, at which point we can apply a focusing rule.</p>
<h3>Until next time...</h3>
<p>I haven't quite had time to do the thing I originally set out to do, which was to work through the notation in "Polarity and the logic of delimited continuations" better. But I will save that for another time. The motivation is the same as the one from before: it seems like we're almost certainly duplicating work. Is it possible to give the presentation of polarized classical logic from the previous section using about half as many rules?</p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com2tag:blogger.com,1999:blog-827419514217130218.post-30260022627220721242011-11-12T10:50:00.001-05:002011-11-13T14:45:54.264-05:00Another take on polarized natural deduction<p>This has been sitting on my office whiteboard for a few days, where it doesn't do anybody (well, except for me and my officemates) any good. It's a canonical-forms presentation of natural deduction for polarized logic that corresponds to the focused sequent calculus I presented and analyzed in the (recently-updated) <a href="www.cs.cmu.edu/~rjsimmon/drafts/focus.pdf">Structural focalization</a> draft (PDF warning). The polarized sequent calculus in that draft isn't new: it's a tweaked version of Liang and Miller's authoritative LJF.<sup><a href"#atake1">1</a></sup> This canonical-forms presentation, however, is not something I've seen, so I'd be interested to know if it's been seen elsewhere: I know this is an area where a lot of other people have been working.</p>
<h2>A bidirectional type system for polarized logic</h2>
<p>There is, in my mind at least, no argument about what the <i>propositions</i> of polarized intuitionstic logic are; the following is straight of the aforementioned draft, but the basic idea dates back to Girard's 1991 post to the LINEAR mailing list, creatively titled "<a href="http://www.seas.upenn.edu/~sweirich/types/archive/1991/msg00123.html">On the sex of angles</a>".</p>
<p>
\( {\bf Positive~propositions:} ~~ A^+, B^+, C^+ ~~ ::= ~~ p^+ \mid {\downarrow}A^- \mid \bot \mid A^+ \vee B^+ \mid \top^+ \mid A^+ \wedge^+ B^+ \)
<br/>
\( {\bf Negative~propositions:} ~~ A^-, B^-, C^- ~~ ::= ~~ p^- \mid {\uparrow}A^+ \mid A^+ \supset B^- \mid \top^- \mid A^- \wedge^- B^- \)
</p>
<p>What makes a proposition positive or negative? Good question! I won't address it here. (I address it a bit in the draft.)</p>
<p>Following tradition and best practices, we will structure the canonical forms presentation as a bidirectional type system. There are three judgments to worry about, as compared to the two judgments in other canonical forms presentations of logic. These judgments include contexts \(\Gamma\), which are sets of <i>negative</i> variables \(x\) associated with negative propositions (\(x{:}A^-\)) and <i>positive</i> variables \(z\) associated with <i>atomic</i> positive propositions (\(z{:}p^+\)).</p>
<ul>
<li><p>\( \Gamma \vdash R \Rightarrow A^- \) - this is the familiar synthesis judgment from canonical forms presentations; it expresses that the <i>atomic term</i> \(R\) <i>synthesizes</i> \(A^-\). The word "synthesis" is used because it is possible to think of the type \(A^-\) as an output to the judgment, whereas \(\Gamma\) and \(R\) are inputs. In the other two judgments, everything will be treated as an input.</p>
<p>\( R ::= x \mid R~V \mid \pi_1 R \mid \pi_2 R \)</p></li>
<li><p>\( \Gamma \vdash V \Leftarrow A^+ \) - this is the new judgment corresponding to <i>right focus</i> in the focused sequent calculus; we say that the <i>value</i> \(V\) <i>checks against</i> \(A^+\).</p>
<p>\( V ::= z \mid N \mid {\sf inl}~V \mid {\sf inr}~V \mid \langle\rangle^+ \)</p> </li>
<li><p>\( \Gamma \vdash N \Leftarrow [\Omega] A^-\) - this is a modification of the familiar checking judgment from canonical forms presentations, which usually lack the bit about \([\Omega]\), which is an ordered list of positive propositions. The reason we need \([\Omega]\) is precisely because we're dealing with positive propositions, which most canonical forms presentations lack or deal with in an unsatisfactory (in my humble opinion) manner. (I'll return to this point in the discussion at the end.)We say that thus judgment expresses that the <i>normal term</i> \(N\) <i>decomposes</i> \(\Omega\) <i>and verifies</i> \(A^-\).</p>
<p>\( N ::= R \mid z.N \mid V \mid {\sf let}~R~{\sf in}~N \mid x.N \mid \lambda N \mid \langle\rangle^- \mid \langle N_1, N_2 \rangle^- \mid {\sf abort} \mid [ N_1, N_2 ]\)</p></li>
</ul>
<p>Except for the first four rules, everything is patterned in the usual style of presentation for a natural deduction system: for each connective, we give first the introduction rules and then the elimination rules.</p>
<h3>Hypothesis and atomic propositions</h3>
\[
\infer
{\Gamma \vdash R \Leftarrow []p^-}
{\Gamma \vdash R \Rightarrow p'^-
&
p^- = p'^-}
\qquad
\infer
{\Gamma \vdash x \Rightarrow A^-}
{x{:}A^- \in \Gamma}
\qquad
\infer
{\Gamma \vdash z \Leftarrow p^+}
{z{:}p^+ \in \Gamma}
\qquad
\infer
{\Gamma \vdash z.N \Leftarrow [p^+, \Omega] C^-}
{\Gamma, z{:}p&+ \vdash N \Leftarrow [\Omega] C^-}
\]
<h3>Shifts</h3>
\[
\infer
{{\uparrow}A^+~{\it stable}}
{}
\qquad
\infer
{p^-~{\it stable}}
{}
\]
\[
\infer
{\Gamma \vdash V \Leftarrow []{\uparrow}A^+}
{\Gamma \vdash V \Leftarrow A^+}
\qquad
\infer
{\Gamma \vdash {\sf let}~R~{\sf in}~N \Leftarrow []C^-}
{\Gamma \vdash R \Rightarrow {\uparrow}A^+
&
\Gamma \vdash N \Leftarrow [A^+]C^-
&
C^-~{\it stable}}
\]
\[
\infer
{\Gamma \vdash N \Leftarrow {\downarrow}A^-}
{\Gamma \vdash N \Leftarrow []A^-}
\qquad
\infer
{\Gamma \vdash x.N \Leftarrow [{\downarrow}A^-, \Omega]C^-}
{\Gamma, x{:}A^- \vdash N \Leftarrow [\Omega]C^-}
\]
<h3>Connectives</h3>
\[
\infer
{\Gamma \vdash \lambda N \Leftarrow []A^+ \supset B^-}
{\Gamma \vdash N \Leftarrow [A^+]B^-}
\qquad
\infer
{\Gamma \vdash R~V \Rightarrow B^-}
{\Gamma \vdash R \Rightarrow A^+ \supset B^-
&
\Gamma \vdash V \Leftarrow A^+}
\]
\[
\infer
{\Gamma \vdash \langle\rangle^- \Leftarrow \top^-}
{}
\qquad
{\it (no~elim~rule~for~\top^-)}
\]
\[
\infer
{\Gamma \vdash \langle N_1, N_2 \rangle^- \Leftarrow A^- \wedge^- B^-}
{\Gamma \vdash N_1 \Leftarrow A^-
&
\Gamma \vdash N_2 \Leftarrow B^-}
\qquad
\infer
{\Gamma \vdash \pi_1 R \Rightarrow A^-}
{\Gamma \vdash R \Rightarrow A^- \wedge^- B^-}
\qquad
\infer
{\Gamma \vdash \pi_2 R \Rightarrow B^-}
{\Gamma \vdash R \Rightarrow A^- \wedge^- B^-}
\]
\[
{\it (no~intro~rule~for~\bot)}
\qquad
\infer
{\Gamma \vdash {\sf abort} \Leftarrow [\bot, \Omega]C^-}
{}
\]
\[
\infer
{\Gamma \vdash {\sf inl}~V \Leftarrow A^+ \vee B^+}
{\Gamma \vdash V \Leftarrow A^+}
\qquad
\infer
{\Gamma \vdash {\sf inr}~V \Leftarrow A^+ \vee B^+}
{\Gamma \vdash V \Leftarrow B^+}
\qquad
\infer
{\Gamma \vdash [N_1, N_2] \Leftarrow [A^+ \vee B^+, \Omega] C^-}
{\Gamma \vdash N_1 \Leftarrow [A^+, \Omega] C^-
&
\Gamma \vdash N_2 \Leftarrow [B^+, \Omega] C^-}
\]
\[
\infer
{\Gamma \vdash \langle\rangle^+ \Leftarrow \top^+}
{}
\qquad
\infer
{\Gamma \vdash \langle\rangle.N \Leftarrow [\top^+, \Omega] C^-}
{\Gamma \vdash N \Leftarrow [\Omega] C^-}
\]
\[
\infer
{\Gamma \vdash \langle V_1^+, V_2^+ \rangle \Leftarrow A^+ \wedge^+ B^+}
{\Gamma \vdash V_1^+ \Leftarrow A^+
&
\Gamma \vdash V_2^+ \Leftarrow B^+}
\qquad
\infer
{\Gamma \vdash N \Leftarrow [A^+ \wedge^+ B^+, \Omega] C^-}
{\Gamma \vdash N \Leftarrow [A^+, B^+, \Omega] C^-}
\]
<h2>Discussion</h2>
<p>There are two possible questions I want to address about this system in the previous section.</p>
<h3>What's with those positive "elimination" rules?</h3>
<p>It would be possible to complain that the system above is not very "natural deduction-ey" after all - for all the positive connectives, I basically give sequent calculus <i>left rules</i> instead of natural deduction <i>elimination rules</i>. What happened to the usual "large elimination"-style elimination rules, for instance the usual disjunction-elimination rule whose proof term is a case analysis?
\[
\infer
{\Gamma \vdash ({\sf case}~R~{\sf of}~{\sf inl}~x \rightarrow N_1 \mid {\sf inr}~y \rightarrow N_2) \Leftarrow C}
{\Gamma \vdash R \Rightarrow A \wedge B
&
\Gamma, x{:}A \vdash N_1 \Leftarrow C
&
\Gamma, y{:}B \vdash N_2 \Leftarrow C}
\]</p>
<p>I think that the answer can be given by looking at the shifts. Essentially, every large elimination as we know and love it follows from the structure of the \({\uparrow}\) elimination rule, which all on its own looks an awful lot like a cut. You should verify for yourself that, if you let \({\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2\) be defined as syntactic sugar for \({\sf let}~R~{\sf in}~[ x.N_1, y.N_2]\), then the following rule is derivable whenever \(C^-~{\it stable}\) holds.<sup><a href="#atake2">2</a></sup>
\[
\infer
{\Gamma \vdash {\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2 \Leftarrow C^-}
{\Gamma \vdash R \Rightarrow {\uparrow}({\downarrow}A^- \vee {\downarrow}B^-)
&
\Gamma, x{:}A^- \vdash N_1 \Leftarrow C^-
&
\Gamma, y{:}B^- \vdash N_2 \Leftarrow C^-}
\]
</p>
<p>Pay attention to those two appearances of the downshift \({\downarrow}\) - they tell you something important about the structure of the usual elimination rules, which is that they "lose focus" while decomposing the disjunction. The usual way of thinking of normal natural deduction doesn't require, when you decompose \(A \vee B\) in an elimination, that you continue decomposing \(A\) and \(B\), which is represented here by the fact that, to match the structure of the usual elimination rule, you have to put downshifts \in explicitly. Jacob Howe, in his thesis and in his excellent paper "Proof search in lax logic," demonstrates this by making a focused sequent calculus that corresponds to the usual (constantly-interrupted) notion of decomposing positive propositions that you get if you follow your intuitions from natural deduction too closely.</p>
<p>By gathering all the large eliminations together in the \({\uparrow}\) elimination rule, we allow for the usual large eliminations to be defined, but also allow for the possibility that we might want to "chain" large eliminations in a well-defined way. (As an exercise, consider the structure of the elimination rule for \({\uparrow}(({\downarrow}A^- \wedge^+ {\downarrow}B^-) \vee (p^+ \wedge^+ \top^+))\).) This is why I claim that this is a natural deduction system that corresponds to the focused sequent calculus, instead of Howe's system where it's the other way around.<sup><a href="#atake3">3</a><sup></p>
<h3>Where are all the patterns?</h3>
<p>Patterns have been associated with focused and/or canonical forms presentations of logic ever since... well, since Neel wrote the paper "<a href="http://dl.acm.org/citation.cfm?id=1480927">Focusing on pattern matching</a>"... or maybe since Noam wrote "<a href="http://dl.acm.org/citation.cfm?id=1328482">Focusing and higher-order abstract syntax</a>"... well, really at least since the <a href="http://reports-archive.adm.cs.cmu.edu/anon/2002/abstracts/02-101.html">CLF tech report</a>. A lot of these, notably Noam's systems, have presented the rules of logic using <i>pattern judgments</i>, devices which abstractly represent the way in which values of a particular (positive) type are constructed or the way atomic terms of a particular (negative) type are eliminated.</p>
<p>There's this picture that isn't fully formed in my head, but that I've been thinking about for some time. On the left side of this picture, I think, you have the (pattern-free) presentation of natural deduction that I have given here at the top, and the (pattern-free) focused sequent calculus from "Structural focalization" at the bottom. Then, in the middle, you have (at the top) a natural deduction system that uses Noam's pattern judgments to <i>introduce</i> negative propositions and <i>eliminate</i> positive propositions - this is precisely (or at least very nearly) Taus Brock-Nannestad and Carsten Schürmann's system from "<a href="http://www.springerlink.com/content/h7551281753g5266/">Focused Natural Deduction</a>." Below it, there is a sequent calculus system that uses Noam's pattern judgments to <i>eliminate</i> negative propositions and <i>introduce</i> positive propositions. Kevin Watkins and Frank Pfenning came up with this idea and named it the "skeleton calculus" (a reference to the "spine calculus" of Cervesato and Pfenning), but it hasn't been written up that I know of. The skeleton calculus was what I was thinking about this morning when I decided to write this post. Then, on the far right, you have Noam's system, which is entirely pattern-based: patterns are used to both introduce and eliminate all connectives, so that the logic itself basically doesn't "know" about any connectives at all. This hazy picture is why, in the structural focalization draft, I mentioned that I thought Noam's system was a "natural synthesis of natural deduction and sequent calculus presentations".</p>
<p>But why should the picture look like the one I sketched above? Why not have a natural deduction system that uses patterns to introduce positives and eliminate negatives, or a sequent calculus that uses patterns to eliminate positives and introduce negatives? There's also the elephant in the room: CLF, which has both natural deduction and sequent calculus presentations, but which, in both instances, uses patterns only in the elimination of positive connectives. What are all these options doing here, and what are we to make of them? I don't know (yet).</p>
<hr>
<small>
<sup>1<a name="atake1"></a></sup> I should add that, while the adaptation of LJF isn't particularly interesting, the proof term assignment I give is different than any others I've seen and I'm pretty happy with it; that's another case where I'd be interested to know if others have done anything similar.<br/>
<sup>2<a name="atake2"></a></sup> This additional requirement of stability just reflects that it's always possible to restrict large eliminations in a canonical forms presentation of natural deduction to situations where the; this isn't always required in canonical forms presentations of natural deduction, but is an important part of making sure the sequent calculus presented in "Structural focalization" corresponds correctly to the natural deduction presentation.<br/>
<sup>3<a name="atake3"></a></sup> I specifically suspect that this is a natural deduction system <i>isomorphic</i> to the focused sequent calculus from Structural focalization, but I don't want to make that claim until I've proved it.
</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com7tag:blogger.com,1999:blog-827419514217130218.post-54578359346753509722011-10-10T15:11:00.000-04:002014-01-09T11:20:00.085-05:00Feeble Typing (a thought on Dart)<div dir="ltr" style="text-align: left;" trbidi="on">
<i><b><br /></b></i>
<i><b>Update: A Word About The Title</b></i>. If you came here from Twitter, the line I used - "not unsound, incomplete" - referred to the original title of the article "Incomplete by design," which was based on my misunderstanding of the dominant terminology in static analysis (see updates, and thanks David and Sam in the comments for pointing out my wrongness). When I realized this, I renamed the article "Deliberate decisions," but that wasn't an interesting title. When I read Rafaël Garcia-Suarez's take on Dart this morning, <a href="http://blogs.perl.org/users/rafael_garcia-suarez/2011/10/why-dart-is-not-the-language-of-the-future.html">Why Dart is not the language of the future</a>, I decided to support his proposal of calling types as a "lint-type development aid, not a language feature" <i>feeble typing</i>, and renamed the article again. You should read Rafaël's post and this one; it is possibly the most agreement you will ever find between a Perl blogger and a Carnegie Mellon University programming languages graduate student.<br />
<br />
<h3>
...anyway...</h3>
<br />
There's a lot of feedback, and a non-trivial amount of snark, going around the internet based on the release of <a href="http://www.dartlang.org/">Dart</a>, a proposed Javascript-killer by Google. My primary experience with Javascript is that people tell me it's the worst compiler target language that is widely used as a compiler target language, so I basically have nothing invested in the language, but was interested by the discussions it brought up.<br />
<br />
The snark about Dart has centered around the following line on Page 72 of the language spec, which I believe was pointed out to the Twitterverse by <a href="https://twitter.com/#!/debasishg/status/123332940175376384">Debasish Ghosh</a>.<br />
<blockquote>
<i>
The type system is unsound, due to the covariance of generic types. This is
a deliberate choice (and undoubtedly controversial). Experience has shown that
sound type rules for generics fly in the face of programmer intuition. It is easy
for tools to provide a sound type analysis if they choose, which may be useful
for tasks like refactoring.
</i></blockquote>
But what does it mean for a type system to be unsound? I really think that the most illustrative snippet about Dart types came on the following page of the language spec was not the one that Debasish retweeted, but one that came on the next page:<br />
<blockquote>
<i>
A Dart implementation must provide a static checker that detects and reports exactly those situations this specification identifies as static warnings.
However:
</i><br />
<ul>
<li><i>Running the static checker on a program P is not required for compiling and running P.</i></li>
<i>
<li>Running the static checker on a program P must not prevent successful compilation of P nor may it prevent the execution of P, regardless of whether any static warnings occur</li>
</i></ul>
</blockquote>
This, for me, clarified what was going on substantially. Let me tell you a parable.<br />
<br />
<h2>
How to anger students in an undergraduate PL course</h2>
<div>
<br /></div>
In four easy steps!<br />
<br />
<h3>
Step 1</h3>
<div>
<br /></div>
Tell students to implement the following dynamic semantics of a programming language. Here's an example of a very simple language:<br />
<br />
\[\infer
{{\tt if}~e~{\tt then}~e_1~{\tt else}~e_2 \mapsto {\tt if}~e'~{\tt then}~e_1~{\tt else}~e_2 }
{e \mapsto e'}
\]
\[\infer
{{\tt if}~{\tt true}~{\tt then}~e_1~{\tt else}~e_2 \mapsto e_1}
{}
\qquad
\infer
{{\tt if}~{\tt false}~{\tt then}~e_1~{\tt else}~e_2 \mapsto e_2}
{}
\]
\[
\infer
{e_1 + e_2 \mapsto e_1' + e_2}
{e_1 \mapsto e_1'}
\qquad
\infer
{{\tt num}(n_1) + e_2 \mapsto {\tt num}(n_1) + e_2'}
{e_2 \mapsto e_2'}
\]
\[
\infer
{{\tt num}(n_1) + {\tt num}(n_2) \mapsto {\tt num}(n_1 + n_2)}
{}
\]
<br />
<br />
The implementation was to be an ML function <tt>step</tt> with type <tt>expr -> expr option</tt>, and the specification was that <tt>step e = SOME e'</tt> if there existed an <tt>e'</tt> such that <tt>e</tt> \(\mapsto\) <tt>e'</tt>, and that <tt>step e = NONE</tt> otherwise (for instance, \(\tt true\) obviously can't take a step according to these rules).<br />
<br />
<h3>
Step 2</h3>
<br />
Describe how they can type-checking the language, by defining a type system like this. Have them implement this type checker as an ML function with type <tt>expr -> typ option</tt>, same idea.<br />
<br />
\[
\infer
{{\tt true} : {\tt bool}}
{}
\qquad
\infer
{{\tt false} : {\tt bool}}
{}
\qquad
\infer
{{\tt if}~e~{\tt then}~e_1~{\tt else}~e_2 : \tau}
{e : {\tt bool} & e_1 : \tau & e_2 : \tau}
\]
\[
\infer
{{\tt num}(n) : {\tt number}}
{}
\qquad
\infer
{e_1 + e_2 : {\tt number}}
{e_1 : {\tt number} & e_2 : {\tt number}}
\]
<br />
<br />
<h3>
Step 3</h3>
<br />
Have students <i>prove the theorem that this type system does something</i>. The theorem statement goes as follows, and the proof is by the by-now standard technique of safety-via-progress-and-preservation.<br />
<b>Theorem (Safety):</b> If \(e : \tau\) and \(e \mapsto \ldots \mapsto e'\), then \(e' : \tau\) and also there either exists some \(e''\) such that \(e' \mapsto e''\) or else \(e'\) is of the form \(\tt true\), \(\tt false\), or \({\tt num}(n)\).<br />
<br />
<h3>
Step 4</h3>
<br />
Test their ML code from Step 1 on expressions like <tt>if 4 then true else 9</tt>, breaking many of the students implementations of the dynamic semantics which were prepared only to handle well-typed inputs.<br />
<br />
<h3>
Analysis: is this fair?</h3>
<br />
Think about the perspective of the student who complained about the fact that their interpreter either crashed (or maybe returned <tt>SOME(Num 9)</tt>!) after being handed <tt>if 4 then true else 9</tt>. On one hand, they clearly violated the spirit of the assignment: Step 1 was a perfectly well-defined assignment all on its own, and they didn't fulfill the specification of that particular assignment. But on the other hand, they proved the theorem in Step 3, and perhaps feel as if they should <i>get something</i> out of the fact that they proved that theorem: the ability to only have to reason about the behavior of well-typed programs: why should it be surprising that garbage-in produced garbage-out?<br />
<br />
Compiler writers actually get to think like that; indeed it's almost essential that they be allowed to. On a 32-bit machine, most values are compiled to plain ol' 32-bit words, and so the representation of \(\tt false\) might have the same in-memory representation as, say, \({\tt num}(0)\). Or it might have the same in-memory representation as \({\tt num}(1)\)! It doesn't matter, because, for the compiler writer, the safety theorem has already given a guarantee that the language has <i>canonical forms</i> - that if \(e : {\tt bool}\) and \(e\) eventually steps to an irreducible expression, then that irreducible expression <i>must</i> either be \(\tt true\) or \(\tt false\).<br />
<br />
This means that the compiler writer need not worry about how <tt>if 1 then true else 9</tt> and <tt>if 0 then true else 9</tt> might behave - they may raise an exception, return (the memory representation of) <tt>true</tt>, or return (the memory representation of) <tt>false</tt>. The only programs upon which the compiler promises to behave the same way as the language definition are those that pass the typechecker, and the type safety theorem is a critical component of that promise.<br />
<br />
In this way of looking at the world, the <i>representation independence</i> given by a type system is really quite important, and it means that an <i>unsound</i> type system could cause very very bad things to happen: if you're allowed to mess around with the representation of, say, a pointer, by adding things to it, then you have introduced buffer overflow errors to a language, which would be pretty awful if you allowed code written in this language to execute in a privileged way in your browser. That (hopefully!) is not at all what Dart means when they mean their programming language is unsound.<br />
<br />
<h2>
Theorems versus bugs</h2>
<br />
I think the problem here is that, as a broad overgeneralization, there are two ways to look at what people are <i>doing</i> with types in the first place. On one hand, there is the view that <i>types are a tool to provably preclude certain classes of errors</i> - like the possibility that you might end up with the expresion <tt>if 1 then true else 9</tt> which is "stuck" according to the defined operational semantics. On the other hand, there is the idea that <i>types are a language feature that is aimed at helping document code and reduce the number of bugs in a program without necessarily precluding any particular errors</i>. In the academic circles I travel in, a <i>type system</i> is understood as a technical definition about the former: if you can't prove a theorem about what kind of errors your type system precludes, then it is something else besides a type system. I think it's quite fair to both type systems and static analysis researchers to call the <i>latter</i> notion of types a simple form of static analysis.<br />
<br />
<i>[Updated]</i> There's nothing wrong, per se, with such a static analysis, though I think it's fair to call it an <b>unsound static analysis</b> instead of an <b>unsound type system</b>. To use the language of Ciera Jaspan's recent thesis, start with a particular class of error (gets-stuck, raises-a-certain-exception, divides by zero, whatever). An analysis is <i>sound</i> if it never passes a program with a particular sort of error (permitting a safety theorem about that analysis!) and <i>complete</i> if it fails only programs that will actually manifest the error at runtime.<sup><a href="https://www.blogger.com/blogger.g?blogID=827419514217130218#choice1">1</a></sup> A sound but incomplete analysis is called <i>conservative</i>; the type checkers of ML and Java represent such analyses. An analysis that is neither sound nor complete is called <i>pragmatic</i> by Jaspan, as there aren't any theorems to be proved about such an analysis: they can be justified only by their utility in practice.<br />
<br />
I can certainly think of situations where I'd want a pragmatic analysis. In the past, I have had occasion to write Python, which I will admit I have a certain fondness for. However, I have also complained about how my complicated Python program ran for half an hour and then failed with some cast exception that, upon inspection of the code, was <i>always, statically, evidently going to happen no matter what the complicated bits of the code did</i> and why couldn't it have warned me that it was going to do that <i>before</i> running for half an hour. Even if I implemented an analysis to generate such a warning, Python is and would remain a (hopefully) safe, strongly-typed programming language with exactly one type - the type of tagged data that causes an exception if you try to use an object as an integer (or whatever). The static analysis is trying to prove a different kind of theorem - one that says "you have indicated that raising <tt>ClassCastException</tt> is undesirable, and here's a proof that your current program will raise <tt>ClassCastException</tt>". If the static analysis <i>can't</i> prove that theorem (thus demonstrating a bug), I'm no worse off than I was when I used Python without that static analysis. A type safety theorem, however, would have the form "if the program passes the typechecker, then <tt>ClassCastException</tt> <i>will not be raised</i>."<br />
<br />
And with my current understanding, the "unsound type system" of Dart is just such a "pragmatic analysis" as described by Jaspan. I hope my examples explain what might still be wrong with such a language - if you can't static preclude certain classes of errors, you must either allow "unspecified behavior" (and that way lies buffer overruns and security violations) or else you must be able and willing to check, at execution time, for the occurrence of those errors, which is not efficient (and for some classes of errors may be impossible). You're back in the world of our hypothetical angry student: you've got to be able to handle all the ill-formed programs and obey the language definition on a wider class of programs.<br />
<br />
<i>[Updated]</i> ...you could argue, of course, that you're no worse off than you were when you finished Step 1. On some level, you're certainly right; my hypothetical Python+tool-to-catch-a-couple-of-errors is better, in my humble opinion, than Python without (and this view has the virtue of <a href="http://twitter.com/#!/simrob/status/123532533261549568">honesty</a>). If you want to make that argument, however, I encourage you to read <a href="http://conway.rutgers.edu/~ccshan/wiki/blog/posts/Unsoundness/">Chung-chieh Shan's related blog post about covariant generics</a>, which argues from a less practical-compiler-optimizations and more philosophical point of view that I also find quite convincing. The point he makes is that the traditional view of types is important because <i>types should mean things</i> - unless you want to choke on pencils!<br />
<br />
<h2>
Conclusion</h2>
<br />
In summary: type systems are useful because of type safety theorems: a type safety theorem means that certain things <i>can't</i> happen. One reason this is nice is because the complier writer, the programmer, and the language designer needn't worry about what happens in the case that such an impossible thing happens.<br />
<br />
Types are also a form of documentation, and they're a regular form of documentation that a complier can then take, generating warnings or errors about certain classes of bugs without actually promising to preclude those bugs. A static analysis that uses type information to preclude some (but not all) errors of a particular type is probably better termed an "pragmatic type-based analysis" than an "unsound type system." Garcia-Suarez called it "feeble typing" and I renamed my post accordingly. It's a bit more pejorative than the tone I was originally trying to take in the article, but I like it anyway.<br />
<br />
I think that's a fair way of looking at things, and it puts decidable type systems, on a practical level, as a member of a larger class of type based-static analyses.<sup><a href="https://www.blogger.com/choice2">2</a></sup> In a type system, we must be able to ask a question about what sort of theorem is proved about programs that are well-typed, and if there's no such theorem, then the analysis is still within the class of type-based static analyses, but isn't so much a type system. At the end of the day, of course, English has no central authority, so asking people to distinguish "type-based analyses" from "type systems" may be a fool's errand,<sup><a href="https://www.blogger.com/choice3">3</a></sup> but I think it's worthwhile to delineate the difference, and I don't think my delineation significantly departs from current usage (apart from "feeble typing" which was, I think, coined the day after I originally posted this).<br />
<br />
Something I started off wanting to talk about before this post got too long was <i>why</i> it is the case that "sound type rules for generics fly in the face of programmer intuition" (which is pretty clearly, in my opinion, missing the addendum "in languages with subtyping"), because two of the reasons why I think this is the case are quite interesting on their own. One of them has to do with <i>polynomial data types</i> and <i>persistent data</i>, and the other has to do with <i>base types and refinement types</i> as explored by William Lovas in his thesis. Neither of these ideas are adequately represented in existing programming languages, though they are more-or-less theoretically understood at this point. Another day, perhaps.<br />
<br />
<h3>
Last word</h3>
<br />
I noticed that one of the principal language designers was <a href="http://gototoday.dk/2011/10/10/lars-bak-on-dart/">quoted</a> as follows<br />
<blockquote>
<i>You have to understand that the types are interface types, not implementation types – so the runtime cannot use the type information for anything. The types are simply thrown away during compilation.</i></blockquote>
That is to say, insofar as this article goes, I think I've only said factual things that the language designers would essentially agree with; in particular, they seem to recognize that their use of the word "types" seems bound to confuse (or <a href="http://twitter.com/#!/littlecalculist/status/123639731870380032">troll</a>) others. But calling a "feeble type" an <i>interface types</i> and a "type" an <i>implementation type</i> seems to just be making up words. And, as made-up-words go, I really dislike "interface types" as a neologism (certainly it has nothing to do with Java <tt>interface</tt> or anything that comes up when I search for "interface type"). The theorist's critique of Dart is precisely that things you call "type" should define inviolate interfaces and not mere suggestions that are neither enforced nor checked. Calling them <i>interface types</i> makes them sound like <a href="http://eecs.northwestern.edu/~robby/pubs/">contracts</a>, which are <i>not</i> thrown away by the compiler. "Suggestion type" might be a less pejorative version of "feeble type," perhaps? And "implementation types" is a terrible term to use to describe types in a a type system, types that (by way of a theorem about canonical forms) can be relied upon by <i>both</i> the programmer and the implementation.<br />
<hr />
<sup><a href="https://www.blogger.com/blogger.g?blogID=827419514217130218" name="choice1"></a>1</sup> Godefroid, Nori, Rajamani, and Tetal call the sound analyses <i>may</i> analyses (though it should perhaps be <i>may not</i>, as a sound analysis precludes a certain behavior) and call the complete analyses <i>must</i> analyses (the error must happen) in the paper "Compositional may-must analysis."<br />
<sup><a href="https://www.blogger.com/blogger.g?blogID=827419514217130218" name="choice2"></a>2</sup> This view really <i>isn't</i> fair to the reason that we are interested in type systems and believe they're supposed to be useful, but that has to do with Curry-Howard and the unreasonable effectiveness of mathematics and with other things Shan talked about in his linked post.<br />
<sup><a href="https://www.blogger.com/blogger.g?blogID=827419514217130218" name="choice3"></a>3</sup> Honestly, it probably won't help the fool's errand if I try to call the other group's type based analyses "feeble typing," but here I am, doing it anyway.</div>
Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com26tag:blogger.com,1999:blog-827419514217130218.post-15939130291043896162011-10-03T22:58:00.000-04:002011-10-05T07:22:58.332-04:00Slicing and dicing validity: existing ideas<p>This is a story of modal logic and the <i>judgmental methadology</i>, a style of presenting and justifying logic that Pfenning and Davies [<a href="#ak1">1</a>] adapted from Martin Löf's Siena Lectures [<a href="#ak2">2</a>] and which was, in turn, adapted by Bernado Toninho and I [<a href="#ak3">3</a>]. My goal is to re-present some existing work: I changed the title from "<i>old ideas</i>" to "<i>existing ideas</i>" to remove any implicit negative connotation - my next step is to extend these existing ideas a bit, not to replace these old ideas with better ones.</p>
<p>The topic of this story is, at least at first, an account of modal logic that I have adapted from Pfenning and Davies's presentation [<a href="#ak1">1</a>]. I assume some familiarity with the background, this isn't a comprehensive introduction. Pfenning and Davies's topic was a presentation of modal logic as a logic where there are two <i>categorial judgments</i>, \(A~{\it true}\) and \(A~{\it valid}\); the intent is that validity captures <i>unconditional truth</i>, which is eventually related to modal necessity.</p>
<p>We collect the categorical judgments of the form \(A~{\it true}\) into <i>true contexts</i> which are written as \(\Gamma\) or \(\Psi\), and we collect the categorical judgments of the form \(A~{\it valid}\) into <i>valid contexts</i> which are written as \(\Delta\). It is by use of contexts that categorical judgments give rise to <i>hypothetical judgments</i>, and Pfenning-Davies modal logic has two hypothetical judgments: \(\seq{\Delta}{A~{\it valid}}\) and \(\seq{\Delta; \Gamma}{A~{\it true}}\)</p>
<p>In a system with hypothetical judgments, meaning is given by describing three kinds of principles: the weakening principles (expressing how the context works), the identity principles (expressing how assumptions are used), and the substitution principles (expressing how assumptions are discharged). We rely on these principles as we explain a logic, and we are required to tie the knot eventually, demonstrating that our logic's particular rules satisfy its defining principles. Here are the defining principles for Pfenning-Davies modal logic:</p>
<p><b>Weakening principles</b>
<ul>
<li>If \(\Delta \subseteq \Delta'\) and \(\seq{\Delta}{A~{\it valid}}\), then \(\seq{\Delta'}{A~{\it valid}}\)</li>
<li>If \(\Delta \subseteq \Delta'\), \(\Gamma \subseteq \Gamma'\), and \(\seq{\Delta; \Gamma}{A~{\it true}}\), then \(\seq{\Delta'; \Gamma'}{A~{\it true}}\)</li>
</ul>
</p>
<p><b>Identity/hypothesis principles</b>
<ul>
<li>\(\seq{\Delta, A~{\it valid}}{A~{\it valid}}\)</li>
<li>\(\seq{\Delta; \Gamma, A~{\it true}}{A~{\it true}}\)</li>
</ul>
</p>
<p><b>Substitution principles</b>
<ul>
<li>If \(\seq{\Delta}{A~{\it valid}}\) and \(\seq{\Delta, A~{\it valid}}{C~{\it valid}}\), then \(\seq{\Delta}{C~{\it valid}}\).</li>
<li>If \(\seq{\Delta}{A~{\it valid}}\) and \(\seq{\Delta, A~{\it valid}; \Gamma}{C~{\it true}}\), then \(\seq{\Delta; \Gamma}{C~{\it true}}\).</li>
<li>If \(\seq{\Delta; \Gamma}{A~{\it true}}\) and \(\seq{\Delta; \Gamma, A~{\it true}}{C~{\it true}}\), then \(\seq{\Delta; \Gamma}{C~{\it true}}\).</li>
</ul>
</p>
<p>Two judgmental rules define the fundamental relationship between the vaildity and truth judgments; I'll call the first of these rules <i>valid introduction</i> and the second <i>valid elimination</i>; they capture the notion that validity is defined relative to truth, and that validity is truth in all contexts.</p>
The fundamental relationship between truth and validity is set up by two rules, which express that <i>validity is unconditional truth</i>. This means that \(A~{\it valid}\) should be entailed by \(A~{\it true}\) in the absence of any conditions, and that proving \(A~{\it valid}\) should entail \(A~{\it true}\) in any conditions, where "conditions" are represented by the context \(\Gamma\) of true hypotheses.The first rule is the introduction rule for validity: it shows how we <i>verify</i> the judgment \(A~{\it valid}\). The second rule is the elimination rule for validity: it shows that, given a proof of \(A~{\it valid}\), we can use it to show the truth of \(A\) in any context \(\Gamma\).</p>
\[
\infer
{\seq{\Delta}{A~{\it valid}}}
{\seq{\Delta; \cdot}{A~{\it true}}}
\qquad
\infer
{\seq{\Delta; \Gamma}{A~{\it true}}}
{\seq{\Delta}{A~{\it valid}}}
\]
<p>The validity judgment is made interesting by its use in defining modal necessity \(\Box A\). Notice how similar the <i>introduction</i> rule for modal necessity is to the <i>elimination</i> rule for the validity judgment! The elimination rule, however, acts like a let- or case-expression; it is the kind of elimination rule known as a <i>large elimination</i>, which are associated with the positive connectives (if you're a fan of polarized logic).</p>
\[
\infer
{\seq{\Delta; \Gamma}{\Box A~{\it true}}}
{\seq{\Delta}{A~{\it valid}}}
\qquad
\infer
{\seq{\Delta; \Gamma}{C~{\it true}}}
{\seq{\Delta; \Gamma}{\Box A~{\it true}}
&
\seq{\Delta, A~{\it valid}; \Gamma}{C~{\it true}}}
\]
<h3>A truth-oriented justification of logic</h3>
<p>This story above is (I believe) broadly compatible with the story from Pfenning and Davies' original paper; certainly what I called the introduction and elimination rules for validity are just the result of writing down the "Definition of Validity" at the beginning of Section 4 of their paper. However, Pfenning and Davies banish the second half of the definition of validity, what I called the elimination rule, from their actual formal system. In its place, they give a valid hypothesis rule, variously written as \(uhyp\) or \(hyp^*\).</p>
\[
\infer
{\seq{\Delta, A~{\it valid}; \Gamma}{A~{\it true}}}
{}
\]
<p>With this change, the introduction rule for validity now only appears in the conclusion of the validity introduction rule and the premise of the \(\Box\) introduction rule, and the two can be collapsed together. The introduction rule for validity is what is called <i>invertible</i> - the conclusion implies the premise - so we don't fundamentally change the \(\Box\) introduction rule if we replace the premise \(\seq{\Delta}{A~{\it valid}}\) with the equivalent premise \(\seq{\Delta; \cdot}{A~{\it true}}\). </p>
\[
\infer
{\seq{\Delta; \Gamma}{\Box A~{\it true}}}
{\seq{\Delta; \cdot}{A~{\it true}}}
\]
<p>These changes to the logical system, write Pfenning and Davies, are <i>sound</i> with respect to the initial definitions I gave (the \(uhyp\) rule, in particular, is derivable by a combination of the valid hypothesis principle and the elimination rule for validity). However, the resulting system is incomplete with respect to the categorical judgment \(A~{\it valid}\). This necessitates that we give up on the one hypothesis principle and one substitution principle that deal exclusively with validity, and also edit the substitution principle with a premise \(\seq{\Delta}{A~{\it valid}}\) to have the premise \(\seq{\Delta; \cdot}{A~{\it true}}\) instead.</p>
<p>These changes flow naturally out of the intentional goal of explaining the meaning of the logical connectives <i>entirely</i> through the lens of the categorical judgment \(A~{\it true}\). This is a perfectly fine way of explaining S4 modal logic, but I don't see it as a fundamental part of the character of judgmental presentations of logic. For instance, there are plenty of situations, (Hybrid LF, Simpson's IK, Constructive Provability Logic) where there is no single categorical judgement for truth, but rather a whole family of indexed judgments, and it's possible to prove things at each of the judgments. This suggests a different story, a different presentation of the same system above. This presentation is one that I've adapted from Jason Reed [<a href="#ak4">4</a>].</p>
<h3>Another story</h3>
<p>If one is perhaps less interested in the correspondence to modal logic, it's possible to take the judgmental setup that we started with and tease apart the notion of modal necessity just a bit. We do this by introducing two connectives, \(\unicode{x25F8}A\) and \(\unicode{x25FF}A\), with the intentional visual pun that \(\unicode{x25F8}\!\!\!\!\unicode{x25FF} A = \Box A\). The introduction and elimination rules for validity are now properly understood as introduction and elimination rules for \(\unicode{x25FF}\), whereas the introduction and elimination rules for \(\Box\) are now understood as introduction and elimination rules for \(\unicode{x25F8}\).</p>
\[
\infer
{\seq{\Delta}{\unicode{x25FF}A~{\it valid}}}
{\seq{\Delta; \cdot}{A~{\it true}}}
\qquad
\infer
{\seq{\Delta; \Gamma}{A~{\it true}}}
{\seq{\Delta}{\unicode{x25FF}A~{\it valid}}}
\]
\[
\infer
{\seq{\Delta; \Gamma}{\unicode{x25F8}A~{\it true}}}
{\seq{\Delta}{A~{\it valid}}}
\qquad
\infer
{\seq{\Delta; \Gamma}{C~{\it true}}}
{\seq{\Delta; \Gamma}{\unicode{x25F8}A~{\it true}}
&
\seq{\Delta, A~{\it valid}; \Gamma}{C~{\it true}}}
\]
<p>Now, if these rules are really the only ones that deal with validity — if all the "regular" connectives like implication are defined the traditional way based only on truth...</p>
\[
\infer
{\seq{\Delta; \Gamma}{A \supset B~{\it true}}}
{\seq{\Delta; \Gamma, A~{\it true}}{B~{\it true}}}
\qquad
\infer
{\seq{\Delta; \Gamma}{B~{\it true}}}
{\seq{\Delta; \Gamma}{A \supset B~{\it true}}
&
\seq{\Delta; \Gamma}{A~{\it true}}}
\]
<p>...then it is possible to observe that this way in which we have teased apart box into two parts actually <i>syntactically differentiates</i> the propositions judged by the two categorical judgments: if we're starting from truth, we'll only introduce propositions that live under a \(\unicode{x25F8}\), and we can only meaningfully use valid propositions that have \(\unicode{x25FF}\) as their outermost connective. This observation, which I picked up from by Jason Reed, makes it possible to talk about true propositions - those propositions judged as true \((A ::= \unicode{x25F8}P \mid A \supset B \mid a )\) separately from valid propositions - those propositions judged valid \((P ::= \unicode{x25FF}A)\) [<a href="#ak4">4</a>]. The rules that we have set up essentially enforce this syntactic separation already, however; there's no particular requirement that we enforce it on our own.</p>
<p>In fact, one thing we can do is put another version of implication "up at validity," like this:</p>
\[
\infer
{\seq{\Delta}{A \supset B~{\it valid}}}
{\seq{\Delta, A~{\it valid}}{B~{\it valid}}}
\qquad
\infer
{\seq{\Delta}{B~{\it valid}}}
{\seq{\Delta}{A \supset B~{\it valid}}
&
\seq{\Delta}{A~{\it valid}}}
\]
<p>It's not terribly obvious why you'd want to do this, but as Neel points out to me in the comments, this was first done in the context of <i>linear</i> logic by Wadler and Benton, where it turns out to be quite useful for programming [<a href="#ak5">5</a>,<a href="#ak6">6</a>]. I was previously unaware of that work (doh). Reed used the idea of putting implication up at validity in order to describe both the necessity modality and the lax-logic circle (the "monad") as fragments of the same logic, which is where I got this notion from. If you ignore the "valid" implication, you get modal logic with \(\Box A = \unicode{x25F8}\!\!\!\!\unicode{x25FF} A\), but if you ignore the "true" implication and instead reason at "validity" (at which point it's definitely not quite right to call the old "validity" validity anymore), you get lax logic with \(\bigcirc A = \unicode{x25FF} \unicode{x25F8} A\). That observation is essentially the same one made by the two translations in Figure 6 of Wander and Benton's paper, though again Wadler and Benton's "truth" judgment was linear, the one I've presented here was persistent, and Reed considered both variants [<a href="#ak6">6</a>].</p>
<p>To reiterate, this isn't a very useful story if one is interested in giving a fundamental account of modal logic, but I believe that "teasing apart" the modal operator, and even opening up the possibility of inserting other propositions in the space of "valid propositions" that such a teasing apart naturally create, raises interesting possibilities. Hopefully in the next post I'll say about about that in the context of contextual modal type theory (CMTT).</p>
<hr/>
<small>
<a name="ak1"></a> [1] Frank Pfenning and Rowan Davies, "A Judgmental Reconstruction of Modal Logic," 1999 (published 2001)<br/>
<a name="ak2"></a> [2] Per Martin-Löf, "On the Meanings of the Logical Constants and the Justifications of the Logical Laws" a.k.a. "The Siena Lectures," 1983 (published 1996)<br/>
<a name="ak3"></a> [3] Robert J. Simmons and Bernardo Toninho, "Principles of Constructive Provability Logic," CMU Tech Report 2010.<br/>
<a name="ak4"></a> [4] Jason Reed, "A Judgmental Deconstruction of Modal Logic," unpublished 2009.<br/>
<a name="ak5"></a> [5] Nick Benton, "A Mixed Linear and Non-Linear Logic: Proofs, Terms and Models," CSL 1995.<br/>
<a name="ak6"></a> [6] Nick Benton and Phil Wadler, "Linear Logic, Monads and the Lambda Calculus," LICS 1996.
</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com3tag:blogger.com,1999:blog-827419514217130218.post-83888721778517346592011-09-27T16:05:00.000-04:002011-09-28T20:48:26.614-04:00My New Focalization Technique is Unstoppable<p>While it took, as they say, a bit of <i>doing</i>, I have a completed draft of a paper on my website that I believe provides a really elegant solution to what has been a very, very annoying problem for some time: writing down a proof called the <i>completeness of focusing</i>. Don't worry, the paper explains what that means: it has to, because one of the things the paper argues is that most existing statements of the completeness of focusing aren't general enough! In the process, this writeup gave me a good (enough) excuse to talk in a very general way about a lot of fundamental phenomena around the idea of polarization in logic that we've been noticing in recent years.</p>
<p>Anyway, the draft is here - <a href="http://www.cs.cmu.edu/~rjsimmon/drafts/focus.pdf">Structural Focalization</a>. The accompanying Twelf development is, of course, on the Twelf wiki: <a href="http://twelf.org/wiki/Focusing">Focusing</a>. I've thrown the paper and the Twelf code up <a href="http://arxiv.org/abs/1109.6273">on arXiv</a> while I figure out while I figure out what to do with it; comments would be greatly appreciated!</p>
<p>I don't want to have a post that's just "hey here's a paper," so here's an addition, a follow-up to the rather popular "<a href="http://requestforlogic.blogspot.com/2010/11/totally-nameless-representation.html">Totally Nameless Representation</a>" post from awhile back. I still prove weakening in Agda by exactly the technique presented there, which commenter <a href="http://requestforlogic.blogspot.com/2010/11/totally-nameless-representation.html?showComment=1290006178509#c6945219533548833041">thecod</a> called the "presheaf method." But for natural deduction systems, I almost never use the term metric approach that was the core of what I was presenting in "totally nameless representation," since it works just fine to use the approach where term <tt>M</tt> of type <tt>A</tt> (the one you're substituting for <tt>x</tt>) is in context <tt>Γ</tt> and the term <tt>N</tt> (the one with <tt>x</tt> free) is in context <tt>Γ' ++ A :: Γ</tt> - the free variable <tt>x</tt> is allowed to be in the middle of the context, in other words; you don't have to use exchange on the second term, so you're always calling induction on something Agda recognizes as structurally smaller.</p>
<p>This worked for natural deduction systems, but I didn't think it would work for sequent calculi, since you needed to toy around with both contexts and induct on both terms in presentation of cut for a sequent calculus. However, for a focused sequent calculus like what I present in the paper, you still can do without the totally nameless metric! If you set things up right, the <i>rightist</i> substitutions (where you work on decomposing the right term, the one with the variable free) allow you to extend the context only of the first term, and the <i>leftist</i> substitutions (where you work on decomposing the left term, the one you're substituting for the variable) allow you to work on the second term, and the two are only connected by the <i>principal</i> substitutions (which reduce the type, which more or less lets you get away with anything as far as the induction principle is concerned).</p>
<p>A code example of this, which could use some more commentary, can be found on GitHub: <a href="https://github.com/robsimmons/agda-lib/blob/lambdatown/Polar.agda">Polar.agda</a>.</p>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-28741785805490352042011-09-08T14:00:00.000-04:002011-09-08T14:00:01.738-04:00Nondeterministic thoughts on nondeterminism<p>So if you've been hanging around lately, I've been writing posts where I <i>think</i> I'm talking about new ideas. (<a href="http://requestforlogic.blogspot.com/2011/08/holey-data-postscript-hole-abstraction.html">I'm not always correct.</a>) This post, on the other hand, is definitely more a review-and-synthesis sort of post; mostly stuff I gleaned over the summer from reading up on Dijkstra's <a href="http://en.wikipedia.org/wiki/Guarded_Command_Language">GCL</a>, Ball et al.'s <a href="http://dl.acm.org/citation.cfm?id=378846">Automatic predicate abstraction of C programs</a>, and a number of K. Rustan M. Leino's <a href="http://research.microsoft.com/en-us/um/people/leino/papers.html">papers</a> as synthesized for me by my advisors Aditya Nori and Sriram Rajamani at Microsoft Research India.</p>
<p>The first section represents my somewhat simplistic thoughts on other's people's work in the semantics of imperative programming languages, mostly thoughts I had this summer at MSR. I hope they are merely naive and simplistic and not actively wrong, and that you (gentle reader) will have patience to correct me where I am actively wrong. Then I have three short, mostly unrelated discussions that I needed to clear out of my head. The first discussion reviews a neat way of understanding loop invariants, due to Leino. The second talks about the algebraic properties of non-deterministic programs. The third discussion tries to relate nondeterminism to the work I'm doing on representing transition systems in linear logic, though it's mostly just speculative and may not make sense to anyone but me in its current half-baked form.</p>
<h2>C programs and their abstractions</h2>
<p>Our starting point will be, essentially, a generic while language that we'll write with C-like syntax. Here's a program:</p>
<pre><span style="color:#000; font-weight:bold"> /* Example 1 */
1. if(x > 0) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;</span></pre>
<p>The thing about a (fully-defined) C program is that it's <i>deterministic</i> - for any initial assignments to variables, there's a unique behavior. So, for the above program, if the initial value of <tt>x</tt> is, say, 4, then the program will execute line 1 (with <tt>x</tt>=4), then line 2 (with <tt>x</tt>=4), then line 6 (with <tt>x</tt>=9</tt>), and the program will then return 9. If the initial value of <tt>x</tt> is -12, then the program will execute line 1 (with <tt>x</tt>=-12), then line 4 (with <tt>x</tt>=-12), then line 6 (with <tt>x</tt>=10</tt>), and the program will then return 10. I've now implicitly told you what what counts as a "behavior" - it's a stream of line numbers and states that may or may not end with a returned value. So the meaning of a program is a function from initial states to <i>streams</i> of line numbers and states.</p>
<p>We can even think of the line numbers as a convenient fiction: we could translate the program above into this form:</p>
<pre><span style="color:#000; font-weight:bold"> linenum = 1;
<span style="color:#0cc">checkpoint();</span>
if(x > 0) {
linenum = 2;
<span style="color:#0cc">checkpoint();</span>
x += 5;
} else {
linenum = 4;
<span style="color:#0cc">checkpoint();</span>
x = -x - 2;
}
linenum = 6;
<span style="color:#0cc">checkpoint();</span>
return x;</span></pre>
<p>Then we say that the behavior of a program is a function from initial states to the stream of intermediate states as reported by the special <tt><span style="color:#0cc; font-weight:bold">checkpoint()</span></tt> function; the "line number" part of the stream is just handled by the value associated with <tt>linenum</tt> in the memory state.</p>
<p>So that's the meaning (the <i>denotation</i>) that I choose for deterministic C programs: they're functions from initial states to streams of states (where the state includes the line number). From here on out I'll just think of any given deterministic C program as a way of specifying one such function. There are also many such functions that can't be written down appropriately with the syntax of a C program; that's not my concern here.</p>
<h3>Nondeterminism</h3>
<p>Instead, I want to talk about non-deterministic C programs. The meaning of a nondeterministic C program is a function from initial states to <i>sets</i> of streams of states,<sup><a href="#alg-foot1">1</a></sup> and the syntax we'll use to write down nondeterministic C programs is an extension of the syntax of deterministic C programs. This means, of course, that there's a trivial inclusion of deterministic C programs into nondeterministic C programs.</p>
<p>The easiest way to come up with nondeterministic C programs (which represent sets of functions from initial states to streams) is to turn <tt>if</tt> branches into nondeterministic branches. The standard way that the C model checking folks represent nondeterministic choice is to write <tt>if(*)</tt>. Here's an example:</p>
<pre><span style="color:#000; font-weight:bold"> /* Example 2 */
1. if(*) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;</span></pre>
<p>The <tt>*</tt>, near as I can tell, was chosen to resemble the "wildcard" symbol, since <i>any</i> boolean expression we put in that <tt>if</tt> statement (such as <tt>x > 0</tt> to recover our Example 1) results in a program that refines Example 2. (Terminology: a program <i>refines</i> another if, for every initial state, every stream in the meaning of the more-refined program also belongs to the meaning of the less-refined program.)</p>
<h3>Assume/impossible</h3>
<p>Nondeterministic choice allows us to <i>enlarge</i> the meaning of a nondeterministic program. Inserting <tt>assume()</tt> statements will <i>cut down</i> the set of streams in the meaning of a nondeterministic program. In particular, we have to exclude any streams that would, for any initial state, violate an assumption. We therefore have to be careful that we don't cut down the set of streams so far that there aren't any left: there are no deterministic programs that refine <tt>assume(false)</tt> - every initial state maps to the empty set of streams. For that matter, there also aren't any deterministic programs that refine Example 3:</p>
<pre><span style="color:#000; font-weight:bold"> /* Example 3 */
1. assume(x > 0);
2. return x;</span></pre>
<p>For every initial state where the value of <tt>x</tt> is positive, the meaning of Example 3 is a set containing only the stream that goes to line one, then line two, then returns the initial value of <tt>x</tt>. For every initial state where the value of <tt>x</tt> is not positive, the meaning of the program has to be the empty set: any stream would immediately start by violating an assumption.</p>
<p>Assumptions were used in Ball et al.'s "Automatic predicate abstraction of C programs" paper, which explains part of the theory behind the SLAM software verifier. In that work, they got rid of <i>all</i> of the <tt>if</tt> statements as a first step, replacing them with nondeterministic choices immediately followed by assumptions.<sup><a href="#alg-foot2">2</a></sup></p>
<pre><span style="color:#000; font-weight:bold"> /* Example 4 */
1. if(*) {
assume(x > 0);
2. x += 5;
3. } else {
assume(x <= 0);
4. x = -x - 2;
5. }
6. return x;</span></pre>
<p>The program in Example 4 is basically a degenerate nondeterministic program: its meaning is exactly equivalent to the deterministic Example 1.
On the other hand, if we remove the statement <tt>assume(x <= 0)</tt> after line 3 in Example 4, we have a nondeterministic program that is refined by many deterministic programs. For instance, the deterministic program in Example 1 refines Example 4 without the <tt>assume(x <= 0)</tt>, but so do Examples 5 and 6:</p>
<pre><span style="color:#000; font-weight:bold"> /* Example 5 */
1. x = x;
4. x = -x - 2;
6. return x;
/* Example 6 */
1. if(x > 100) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;</span></pre>
<p>Example 4 as it was presented shows how <tt>assume()</tt> together with nondeterministic choice can encode normal <tt>if</tt> statements. We could also define an statement <tt>impossible</tt> and say that a stream just cannot ever reach an <tt>impossible</tt> statement. Impossibility can be defined in terms of <tt>assume</tt> - <tt>impossible</tt> is equivalent to <tt>assume(false)</tt>. Alternatively, we can use impossibility to define assumptions - <tt>assume(e)</tt> is equivalent to <tt>if(e) { impossible; }</tt>. So there's a bit of a chicken-and-egg issue: I'm not totally sure whether we should build in <tt>impossible</tt>/<tt>if</tt> combination and use it to define <tt>assume()</tt> or whether we should build in <tt>assume()</tt> and use it to define <tt>impossible</tt> and <tt>if</tt> statements. It probably doesn't matter.</p>
<h3>Assert/abort</h3>
<p>In "Automatic predicate abstraction of C programs," <tt>assert()</tt> statements are understood to be the things that are supposed to be avoided. However, in Leino's work, they have a meaning of absolute and unbounded nondeterminism, which is the interpretation I want to use. If the expression <tt>e</tt> in an <tt>assert(e)</tt> statement evaluates to false, anything can happen - it's as if we could jump to an arbitrary point in memory and start executing code; absolutely any deterministic program that refines a nondeterministic program up to the point where the nondeterministic program fails an assertion will definitely refine that nondeterministic program.</p>
<p>So <tt>assert()</tt> represents unbounded nondeterminism: and in the sense of "jump to any code," not just in the sense of "replace me with any code" - the program <tt>assert(false); while(true) {}</tt> is refined by every program, including ones that terminate. This interpretation is easy to connect to the SLAM interpretation where we say "assertion failures are to be avoided," since obviously one of the things you might prove about your C code is that it doesn't jump to arbitrary code and start executing it.</p>
<p>Analogy: <tt>assert()</tt> is to <tt>abort</tt> as <tt>assume()</tt> is to <tt>impossible</tt> - we can define <tt>assert(e);</tt> as <tt>if(e) { abort; }</tt>.
<h2>Abstracting loops</h2>
<p>The three primitives we have discussed so far are almost enough to let us perform a fun trick that my advisors at MSR attribute to Leino. First, though, we need one more primitive, a "baby" form of assert/abort called <tt>havoc(x)</tt>, which allows the value associated with the variable <tt>x</tt> to be changed in any way. In other words, a program with the statement <tt>havoc(x)</tt> is refined by the program where the statement <tt>havoc(x)</tt> is replaced by the statement <tt>x = 4</tt>, the statement <tt>x = x - 12</tt>, the statement <tt>x = y - 16</tt>, or even the statement <tt>if(z) { x = y; } else { x = w; }</tt>.</p>
<p>Given the <tt>havoc</tt> primitive, imagine we have a program with a loop, and no checkpoints inside the loop:</p>
<pre><span style="color:#000; font-weight:bold"> 1. /* Before the loop */
while(e) {
... loop body,
which only assigns
to variables x1,...,xn ...
}
2. /* After the loop */</span></pre>
<p>Say we know the following two things:
<ul>
<li>The loop will always terminate if the expression <tt>e_inv</tt> evaluates to true at line 1, and</li>
<li>From any state where <tt>e_inv</tt> and <tt>e</tt> both evaluate to true, after the loop body is run, <tt>e_inv</tt> will evaluate to true.
</ul>
Then, we know the program above refines the following program:</p>
<pre><span style="color:#000; font-weight:bold"> 1. /* Before the loop */
assert(e_inv);
havoc(x1); ... havoc(xn);
assume(e_inv && !e);
2. /* After the loop */</span></pre>
<p>This is a somewhat unusual way of looking at loop invariants: we can take a loop and abstract it with nondeterministic straight line code. If we get to the beginning of the loop and our loop invariant is violated, all hell breaks loose, but if the loop invariant holds at the beginning, then when we exit the loop we know the following things: variables not assigned to by the loop haven't changed, the loop invariant holds, and the negation of the loop guard holds.</p>
<p>I like this particular game: it's a non-traditional way of looking at the analysis of loops by asking "is this program abstraction sound."</p>
<h2>Algebraic structure of non-determinism</h2>
<p>Notice that <tt>assume(false)</tt> is the <b>unit</b> of non-deterministic choice: writing the nondeterministic program <tt>if(*) assume(false); else Stmt</tt> is, in all program contexts, the same as writing just <tt>Stmt</tt>. Furthermore, nondeterministic choice is commutative (<tt>if(*) Stmt1; else Stmt2;</tt> is equivalent to <tt>if(*) Stmt2; else Stmt1;</tt>) and associative (it doesn't matter how I nest nondeterministic choices if I want to make a three-way nondeterministic choice). This means that nondeterministic choice and the <tt>impossible</tt> statement (which is equivalent to <tt>assume(false)</tt>) form a <i>monoid</i> - it's like the wildcard <tt>*</tt> could also be interpreted as multiplication, and we could write <tt>if(*) Stmt2; else Stmt1;</tt> as <tt>Stmt1 × Stmt2</tt> and write <tt>impossible</tt> as <tt>1</tt>.</p>
<p>Furthermore, <tt>if(*) assert(false); else Stmt</tt> is a nondeterministic program that is refined by every program, since <tt>assert(false)</tt> refines every program and we can just use the "wildcard" reading of nondeterminism to replace <tt>*</tt> with <tt>true</tt>. Algebraically, this means that the <tt>abort</tt> statement (which is equivalent to <tt>assert(false)</tt>) <i> annihilates </i> nondeterminism - we could write <tt>impossible</tt> as <tt>0</tt>, and then we have <tt>0 × Stmt = Stmt × 0 = 0</tt>.</p>
<p>Is <tt>abort</tt> the unit of a binary operation <tt>Stmt1 + Stmt2</tt> in the same way that the number 0 is the unit of addition? It's not clear that it's useful for the abstraction of C programs, but I think if we go looking for a binary operation that <tt>abort</tt> is the unit of, what we'll find is perhaps best called <tt>both</tt>.</p>
<h3>Both-and statements</h3>
<p>The <tt>both</tt> statement is a weird statement that allows us to complete the picture about the algebraic structure of <tt>abort/assert(false)</tt>, <tt>impossible/assume(false)</tt> and <tt>if(*)</tt>. A deterministic program only refines the program <tt>both Stmt1 and Stmt2</tt> if it refines both <tt>Stmt1</tt> and <tt>Stmt2</tt>. As an example the program below is exactly equivalent to Example 1 - the first statement forces all streams starting from initial states where <tt>x</tt> is not positive to hit line 4 and not line 2 (lest they violate the <tt>assume(x > 0)</tt>, and the second statement forces the initial states where <tt>x</tt> is positive to hit line 2 and not line 4 (lest they violate the <tt>assume(x <= 0)</tt>.</p>
<pre><span style="color:#000; font-weight:bold"> /* Example 7 */
both {
1. if(*) {
assume(x > 0);
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
} and {
1. if(*) {
2. x += 5;
3. } else {
assume(x <= 0);
4. x = -x - 2;
5. }
}
6. return x;</span></pre>
<p>The <tt>both</tt> statement is a binary operation whose unit is <tt>abort</tt>, forming a monad: doing anything at all - as long as it's some specific thing - is the same as doing that specific thing. The <tt>both</tt> is also annihilated by <tt>impossible</tt>, because doing nothing as long as it's one specific thing is the same thing as doing nothing. (This is all very fuzzy, but can be made formal in terms of set union and intersection operations.) That's interesting, because <tt>both</tt> statements aren't really like addition at all: we have two monoids whose the units both annihilate the binary operator that they aren't the unit of. If distributivity works in both directions (I'm not sure it does...) then we have a Boolean algebra without negation (what's that called?).</p>
<h2>Two linear logic interpretations</h2>
<p>One of the things that I like to do is to take the state of an evolving machine, encode it as a linear logic context <tt>Δ</tt>, and then say that the linear logic derivability judgment <tt>Δ ⊢ A</tt> proves something about the state. I think there are two ways of doing this for the language semantics I wrote out above. One is familiar, both are interesting.</p>
<h3>The existential interpretation</h3>
<p>The way I'm used to thinking about these things is that <tt>Δ ⊢ A</tt> shows that <tt>A</tt> is a possible (that is, existential) property of the system encoded into the linear logic context <tt>Δ</tt>. In that case, we want to encode nondeterministic as the additive conjunction <tt>A & B</tt>, as we can execute a nondeterministic choice by taking either choice. We want to encode <tt>impossible</tt> as <tt>⊤</tt>, which prevents us from proving anything interesting about a series of nondeterministic choices that lead us to an impossible point in the program. We'll furthermore want to encode <tt>abort</tt> as <tt>0</tt>, since once we reach an abort we can prove any existential property that we want to about the program!</p>
<h3>The universal interpretation</h3>
<p>Another way of understanding derivability would be to say that <tt>Δ ⊢ A</tt> shows that <tt>A</tt> is a necessary property of the system encoded into the linear logic context <tt>Δ</tt>. In that case, we would want to encode nondeterministic choice as the additive disjunction <tt>A ⊕ B</tt>, since to prove that something necessarily holds of a nondeterministically branching program, we have to show that it holds regardless of how the nondeterministic choice is resolved. This ensures that we will have to consider all possible resolutions of nondeterministic choices, but reaching an <tt>impossible</tt> state means that no programs can get to the current state, so all universal properties hold vacuously. We therefore would want to encode <tt>impossible</tt> as <tt>0</tt>. On the other hand, reaching an <tt>abort</tt> means we know nothing about the universal properties of the program - the program can now do anything - so we encode <tt>abort</tt> as <tt>⊤</tt>.</p>
<hr>
<small><a name="alg-foot1"></a><sup>1</sup> Note that functions from initial states to sets of streams is *different* than sets of functions from initial states to streams! Both might be valid ways of looking at the meaning of nondeterministic programs depending on how you look at it, in fact, I originally wrote this post think in terms of the other definition.<br/>
<a name="alg-foot2"></a><sup>2</sup> Certainly Ball et al. didn't come up with this idea - the idea is implicit in the Guarded Command Language; I'm just referring to Ball et al. because I'm sticking with their syntax.</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-83962482112537921872011-09-06T11:30:00.000-04:002011-09-06T11:35:06.125-04:00Clowns to the left of me, jokers to the right, here I am, stuck in a hole abstractionHappy day-after-labor-day weekend! This also stems from a discussion I had with <a href="http://www.ccs.neu.edu/home/turon/">Aaron Turon</a>, who, a while back with <a href="http://research.microsoft.com/en-us/um/people/crusso/">Claudio Russo</a>, also thought about using hole abstractions for maps and folds over data (with some fascinating twists, but I'll leave Aaron and Claudio's twists for Aaron and Claudio).
<br />
<br />One of the things Aaron noticed was that it's nice to define a tail-recursive map over trees this way, and indeed that is true. I transliterated some code Aaron wrote out into Levy#: you can see it at <a href="https://github.com/robsimmons/levy/blob/holey/turon.levy">turon.levy</a>. The example really starts rubbing against two of the unnecessary deficiencies of Levy# - a lack of polymorphism (the trees have to be integer trees) and a lack of mutual recursion. We really want to define the map in terms of two mutually (tail-)recursive functions <tt><span style="color:#0000ff; font-weight:bold">moveUp</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">moveDown</span></tt>, both of which take a holey <tt><span style="color:#0000ff; font-weight:bold">tree -o tree</span></tt> and a <tt><span style="color:#0000ff; font-weight:bold">tree</span></tt> that belongs "in the hole," the difference being that in <tt><span style="color:#0000ff; font-weight:bold">moveUp</span></tt> I've <i>already</i> applied the map function to the subtree in the hole, and in <tt><span style="color:#0000ff; font-weight:bold">moveDown</span></tt> I still need to apply the map function to the subtree in the hole.
<br />
<br />This only works for </i>monomorphic</i> maps - if we had polymorphism, our hole abstraction zippers would allow us to write maps from <tt><span style="color:#0000ff; font-weight:bold"><span style="color:#00aaaa; font-weight:bold">'a</span> tree</span></tt> to <tt><span style="color:#0000ff; font-weight:bold"><span style="color:#00aaaa; font-weight:bold">'a</span> tree</span></tt> using a map function with type <tt><span style="color:#cc4020; font-weight:bold"><span style="color:#00aaaa; font-weight:bold">'a</span> -> F <span style="color:#00aaaa; font-weight:bold">'a</span></tt>. It would not allow us to write the maps we're really used to from <tt><span style="color:#0000ff; font-weight:bold"><span style="color:#00aaaa; font-weight:bold">'a</span> tree</span></tt> to <tt><span style="color:#0000ff; font-weight:bold"><span style="color:#00aaaa; font-weight:bold">'b</span> tree</span></tt> using a map function with type <tt><span style="color:#cc4020; font-weight:bold"><span style="color:#00aaaa; font-weight:bold">'a</span> -> F <span style="color:#00aaaa; font-weight:bold">'b</span></tt>.
<br />
<br />In order to capture more general maps - or <a href="http://en.wikipedia.org/wiki/Fold_(higher-order_function)">fold/reduce</a> operations - the non-hole-abstraction approach is to use the generalization of the derivative that Conor McBride describes in <a href="http://strictlypositive.org/CJ.pdf">clowns to the left of me, jokers to the right</a>. We observe that a map function like the one in <a href="https://github.com/robsimmons/levy/blob/holey/turon.levy">turon.levy</a> sweeps left-to-right through the data structure, always segmenting the data structure into some already-processed "clowns to the left," some not-yet-processed "jokers to the right," and a path through the middle. Then, we either have a sub-tree of clowns, which means we'll now look up for more jokers, or a sub-tree of jokers, which means we need to look down into the jokers to process them. The overall picture is this one:
<br />
<br /><img src="http://typesafety.net/rfl/rjs-cj1.png"/>
<br />
<br />A nice thing about the hole abstraction framework is that we can represent and dynamically manipulate <i>precisely this picture</i>, whereas the derivative/zipper approach generally forces one to think about the derivative-like structure from the "inside out." If the jokers are a tree of ints and the clowns are a tree of bools, we can describe them like this:<pre><span style="color:#009900; font-weight:bold"> data ILeaf: <span style="color:#0000ff">int -o itree</span><br/> | IBranch: <span style="color:#0000ff">itree -o itree -o itree</span><br/><br/> data BLeaf: <span style="color:#0000ff">bool -o btree</span><br/> | BBranch: <span style="color:#0000ff">btree -o btree -o btree</span></span></pre>Then, the data structure that we will use as a representation of the intermediate state of computation as illustrated above is either stuck in the middle with clowns to the left or stuck in the middle with jokers to the right.<pre><span style="color:#009900; font-weight:bold"> data Clowns: <span style="color:#0000ff">btree -o cj -o cj</span> <span style="color:#000000"># Clowns to the left of me</span><br/> | Jokers: <span style="color:#0000ff">cj -o itree -o cj</span> <span style="color:#000000"># Jokers to the right</span></span></pre>Just like our difference lists from before, there are no closed members of the type <tt><span style="color:#0000ff; font-weight:bold">cj</span></tt>; we can only form values of type <tt><span style="color:#0000ff; font-weight:bold">cj -o cj</span></tt>. The map implementation can be found at <a href="https://github.com/robsimmons/levy/blob/holey/clownjoker.levy">clownjoker.levy</a>.
<br />
<br />If we did have polymorphism, the clown/joker data structure for a map would look like this:<pre><span style="color:#009900; font-weight:bold"> data Clowns: <span style="color:#0000ff"><span style="color:#00aaaa">'clown</span> tree <br/> -o <span style="color:#00aaaa">('clown, 'joker)</span> cj <br/> -o <span style="color:#00aaaa">('clown, 'joker)</span> cj</span><br/> | Jokers: <span style="color:#0000ff"><span style="color:#00aaaa">('clown, 'joker)</span> cj <br/> -o <span style="color:#00aaaa">'joker</span> tree <br/> -o <span style="color:#00aaaa">('clown, 'joker)</span> cj</span></span></pre>The more general clown/joker data structure for a fold would look like this:<pre><span style="color:#009900; font-weight:bold"> data Clowns: <span style="color:#0000ff"><span style="color:#00aaaa">'clowns</span> <br/> -o <span style="color:#00aaaa">('clowns, 'joker)</span> cj <br/> -o <span style="color:#00aaaa">('clowns, 'joker)</span> cj</span><br/> | Jokers: <span style="color:#0000ff"><span style="color:#00aaaa">('clowns, 'joker)</span> cj <br/> -o <span style="color:#00aaaa">'joker</span> tree <br/> -o <span style="color:#00aaaa">('clowns, 'joker)</span> cj</span></span></pre>This is, it must be said, all reasonably unsatisfying. We have seen that hole abstractions can <i>completely replace</i> the derivative data structure - we could well and truly "scrap our zippers." To use the clown/joker generalization of derivatives, we have to define an obvious, boiler-plate-ey datatype <tt><span style="color:#0000ff; font-weight:bold">cj</span></tt>, which it is merely mechanical to compute for any datatype of jokers. Can we do better?Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-792975799935191392011-08-24T12:32:00.000-04:002011-12-02T19:12:24.595-05:00Holey Data, Postscript: Hole AbstractionI love the PL community on Twitter. Yesterday, David Van Horn presented this observation:
<br />
<br /><div style="width:500px"><!-- http://twitter.com/#!/lambda_calculus/status/105988679956299777 --> <style type='text/css'>.bbpBox105988679956299777 {background:url(http://a1.twimg.com/images/themes/theme14/bg.gif) #131516;padding:20px;} p.bbpTweet{background:#fff;padding:10px 12px 10px 12px;margin:0;min-height:48px;color:#000;font-size:18px !important;line-height:22px;-moz-border-radius:5px;-webkit-border-radius:5px} p.bbpTweet span.metadata{display:block;width:100%;clear:both;margin-top:8px;padding-top:12px;height:40px;border-top:1px solid #fff;border-top:1px solid #e6e6e6} p.bbpTweet span.metadata span.author{line-height:19px} p.bbpTweet span.metadata span.author img{float:left;margin:0 7px 0 0px;width:38px;height:38px} p.bbpTweet a:hover{text-decoration:underline}p.bbpTweet span.timestamp{font-size:12px;display:block}</style> <div class='bbpBox105988679956299777'><p class='bbpTweet'>Most papers in computer science describe how their author learned what someone else already knew. -- Peter Landin<span class='timestamp'><a title='Tue Aug 23 13:04:05 +0000 2011' href='http://twitter.com/#!/lambda_calculus/status/105988679956299777'>Tue Aug 23 13:04:05</a> via <a href="http://bufferapp.com" rel="nofollow">Buffer</a> <a href='http://twitter.com/intent/favorite?tweet_id=105988679956299777'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/favorite.png' /> Favorite</a> <a href='http://twitter.com/intent/retweet?tweet_id=105988679956299777'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/retweet.png' /> Retweet</a> <a href='http://twitter.com/intent/tweet?in_reply_to=105988679956299777'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/reply.png' /> Reply</a></span><span class='metadata'><span class='author'><a href='http://twitter.com/lambda_calculus'><img src='http://a3.twimg.com/profile_images/1339068530/badlogo_normal.png' /></a><strong><a href='http://twitter.com/lambda_calculus'>dvanhorn@λ-calcul.us</a></strong><br/>lambda_calculus</span></span></p></div> <!-- end of tweet --></div>
<br />I'm gonna be perfectly honest: I was specifically thinking of the <a href="http://requestforlogic.blogspot.com/search/label/holey%20data">holey data</a> series when I responded.
<br />
<br /><div style="width:500px"><!-- http://twitter.com/#!/simrob/status/105996182416932868 --> <style type='text/css'>.bbpBox105996182416932868 {background:url(http://a0.twimg.com/images/themes/theme1/bg.png) #9ae4e8;padding:20px;} p.bbpTweet{background:#fff;padding:10px 12px 10px 12px;margin:0;min-height:48px;color:#000;font-size:18px !important;line-height:22px;-moz-border-radius:5px;-webkit-border-radius:5px} p.bbpTweet span.metadata{display:block;width:100%;clear:both;margin-top:8px;padding-top:12px;height:40px;border-top:1px solid #fff;border-top:1px solid #e6e6e6} p.bbpTweet span.metadata span.author{line-height:19px} p.bbpTweet span.metadata span.author img{float:left;margin:0 7px 0 0px;width:38px;height:38px} p.bbpTweet a:hover{text-decoration:underline}p.bbpTweet span.timestamp{font-size:12px;display:block}</style> <div class='bbpBox105996182416932868'><p class='bbpTweet'>@<a class="tweet-url username" href="http://twitter.com/lambda_calculus" rel="nofollow">lambda_calculus</a> Presumably the implication is that this is bad for the advancement of science. OTOH, it's a pretty viable model for a blog.<span class='timestamp'><a title='Tue Aug 23 13:33:53 +0000 2011' href='http://twitter.com/#!/simrob/status/105996182416932868'>Tue Aug 23 13:33:53</a> via web <a href='http://twitter.com/intent/favorite?tweet_id=105996182416932868'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/favorite.png' /> Favorite</a> <a href='http://twitter.com/intent/retweet?tweet_id=105996182416932868'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/retweet.png' /> Retweet</a> <a href='http://twitter.com/intent/tweet?in_reply_to=105996182416932868'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/reply.png' /> Reply</a></span><span class='metadata'><span class='author'><a href='http://twitter.com/simrob'><img src='http://a0.twimg.com/profile_images/474270007/rome-pop-icon_normal.jpg' /></a><strong><a href='http://twitter.com/simrob'>Rob Simmons</a></strong><br/>simrob</span></span></p></div> <!-- end of tweet --></div>
<br />'Cause really, I was kind of surprised that I hadn't found any previous presentations of the idea (and was therefore encouraged when Conor indicated that he at least <a href="http://twitter.com/#!/pigworker/status/103445003367497728">sort of had thought of it too</a>). Chung-chieh Shan's response was great:
<br />
<br /><div style="width:500px"><!-- http://twitter.com/#!/ccshan/status/106131049997811712 --> <style type='text/css'>.bbpBox106131049997811712 {background:url(http://a0.twimg.com/images/themes/theme1/bg.png) #C0DEED;padding:20px;} p.bbpTweet{background:#fff;padding:10px 12px 10px 12px;margin:0;min-height:48px;color:#000;font-size:18px !important;line-height:22px;-moz-border-radius:5px;-webkit-border-radius:5px} p.bbpTweet span.metadata{display:block;width:100%;clear:both;margin-top:8px;padding-top:12px;height:40px;border-top:1px solid #fff;border-top:1px solid #e6e6e6} p.bbpTweet span.metadata span.author{line-height:19px} p.bbpTweet span.metadata span.author img{float:left;margin:0 7px 0 0px;width:38px;height:38px} p.bbpTweet a:hover{text-decoration:underline}p.bbpTweet span.timestamp{font-size:12px;display:block}</style> <div class='bbpBox106131049997811712'><p class='bbpTweet'>@<a class="tweet-url username" href="http://twitter.com/simrob" rel="nofollow">simrob</a> @<a class="tweet-url username" href="http://twitter.com/lambda_calculus" rel="nofollow">lambda_calculus</a> I'm not sure that Landin intended to imply that it's bad!<span class='timestamp'><a title='Tue Aug 23 22:29:48 +0000 2011' href='http://twitter.com/#!/ccshan/status/106131049997811712'>Tue Aug 23 22:29:48</a> via web <a href='http://twitter.com/intent/favorite?tweet_id=106131049997811712'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/favorite.png' /> Favorite</a> <a href='http://twitter.com/intent/retweet?tweet_id=106131049997811712'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/retweet.png' /> Retweet</a> <a href='http://twitter.com/intent/tweet?in_reply_to=106131049997811712'><img src='http://si0.twimg.com/images/dev/cms/intents/icons/reply.png' /> Reply</a></span><span class='metadata'><span class='author'><a href='http://twitter.com/ccshan'><img src='http://a0.twimg.com/profile_images/1194502495/spacee_normal.png' /></a><strong><a href='http://twitter.com/ccshan'>Chung-chieh Shan</a></strong><br/>ccshan</span></span></p></div> <!-- end of tweet --></div>
<br />I guess I hope he's right (or at least that I am), because in the comments to Part 3, <a href="http://www.ccs.neu.edu/home/turon/">Aaron Turon</a> <a href="http://requestforlogic.blogspot.com/2011/08/holey-data-part-33-type-of-one-hole.html?showComment=1314192917682#c1092290329570263586">correctly observes</a> that I missed a whopper: Minamide's 1998 POPL Paper "<a href="http://dl.acm.org/citation.cfm?id=268953">A Functional Representation of Data Structures with a Hole</a>." Horray for blogs, I guess: if this has been a paper review, I would have been quite embarrassed, but as a blog comment I was delighted to find out about this related work.
<br />
<br /><h2>A Functional Representation of Data Structures with a Hole</h2>Minamide's paper effectively covers the same ground I covered in <a href="http://requestforlogic.blogspot.com/2011/08/holey-data-part-13-almost-difference.html">Part 1</a> of the Holey Data series: his linear representational lambdas are called <i>hole abstractions</i>. It's a very well written paper from the heady days when you could talk about the <i>proposed</i> Standard ML Basis Library and it was still common for authors to cite Wright and Felleisen when explaining that they were proving language safety by (what amounts to) proving progress and preservation lemmas.
<br />
<br />My favorite part of reading the paper was that it simultaneously confirmed two suspicions that I picked up after a recent discussion with Dan Licata:<ol><li>Levy#/call-by-push-value was an unnecessarily restrictive calculus for programming with linear representational functions - ML would work just fine.</li><li>By using call-by-push-value, I'd avoided certain red herrings that would have plagued the development otherwise - somewhere on almost every page I thought "yep, that explanation is less clear than it could be because they don't know about difference between value (positive) types and computation (negative) types yet."</li></ol>One neat thing this paper describes that I hadn't thought about is automatically transforming non-tail-recursive programs that are naturally structurally inductive to tail-recursive programs based on hole abstractions/difference lists/linear representational functions. It seems like this optimization in particular is where many of the paper's claimed performance gains are found. It also seems like zippers-as-derivatives are pretty clearly lurking around the discussion in Section 4, which is neat.
<br />
<br />Overall, this paper made me quite happy - it suggests there is something canonical about this idea, and proves that the idea leads to concrete performance gains. It also made me sad, of course, because it seems like the ideas therein didn't really catch on last time around. But that's part of why I didn't stop with Part 1 in the series: it's not clear, even to me, that hole abstractions/difference lists/linear representational functions are worth adding to a functional programming language if all they can do is be applied and composed. However, with the additional expressiveness that can be found in pattern matching against hole abstractions (hell, that's a way better name than "linear representational functions," I'm just going to call them hole abstractions from now on), I think there's a significantly stronger case to be made.
<br />
<br />I've thrown a transliteration of Minamide's examples <a href="https://github.com/robsimmons/levy/blob/holey/minamide.levy">up on GitHub</a>. The binary tree insertion example, in particular, could come from Huet's paper: it's a beautiful example of how even the "Part 1" language can implement zipper algorithms so long as you never need to move up in the tree. As for the <tt>hfun_addone</tt> function, I don't really understand it at all, I just transliterated it. In particular, it seems to not be entirely tail recursive (in particular, it seems to regular-recurse along the leftmost spine of the binary tree - if I'm not mistaken about this, I accuse <a href="https://github.com/robsimmons/levy/blob/503525c1c7f1690ec586b2203fbfb8f929e5ea8a/minamide.levy#L131">line 131</a> of being the culprit.)
<br />
<br /><h3>6.3 Logic Variable</h3><i>(Note: I've tried to make sure this section doesn't come across as mean-spirited in any way, but I want to emphasize: this is not intended to be any sort of criticism of Yasuhiko Minamide. His paper was terrific! <a href="http://www.score.cs.tsukuba.ac.jp/~minamide/papers/hole.popl98.pdf">Go read it!</a>)</i>
<br />
<br />My other favorite part of reading the paper was Section 6.3, which discusses difference lists. I plan to actually email Minamide and ask him about Section 6.3. Here's my reconstruction of events: Fictional Minamide has a great idea, implements it into a prototype ML compiler, tests it. His paper has strong theoretical and practical results, and gets accepted to POPL 1998. However, one of the reviewers says "oh, this is almost exactly difference lists in logic programming, you need to explain the connection." Fictional Minamide is not a logic programming person, has no access to a logic programming person, doesn't particularly have any reason to care about logic programming, but he does feel the need to address the concerns of a reviewer that he can't actually communicate with. Fictional Minamide manages to find, with great difficulty in the pre-Google-Scholar world, some poorly expressed explanation for what difference lists are in some random papers that are mostly about something else.<sup><a href="#1xepi">1</a></sup> After a couple of hours, Fictional Minamide gives up, exclaiming "okay, this makes absolutely no sense <i>whatsoever</i>," and writes something kinda mumbly about graph reduction that seems vaguely plausible to satisfy the reviewer's demand.
<br />
<br />The result is a section that mostly misses the point about the connection between hole abstraction and difference lists, both of which are <i>declarative abstractions that allow a programmer to think about working modulo an uninitialized pointer in memory</i> (though the way Minamide and I do it, the types help you <i>way</i> more). This is <b>not</b> intended as any criticism of either Real Minamide or Fictional Minamide. Indeed, it's mostly a criticism of the conference review processes: I'm pretty sure you could find similar "Section 6.3"s in my papers as well!<sup><a href="#2xexpi">2</a></sup> I do hope, however, that my exposition in Part 1, which was in fact <i>motivated</i> by difference lists and reached more-or-less the exact same setup that Minamide came up with, clarifies the record on how these two ideas are connected.
<br />
<br /><b>[Update Aug 25]</b> I heard back from Real Minamide - while it was a long time ago, he did recall one of the reviewers mentioning logic variables, leading to the inclusion of that section. I win! Well, kind of. There was probably no "this makes no sense" exclamation; Minimade says that at the time his understanding at the time was in line with my comment about working modulo an uninitialized pointer. The comments about graph reduction, which were why I thought the section misses the point, were more of a side comment.
<br />
<br />Minamide also remembered another wonderfully tantalizing tidbit: he recalls that, at POPL 1998, Phil Wadler said he'd seen a similar idea even earlier. Perhaps hole abstraction is just destined to be continuously reinvented until it finally gets included in the C++2040 standard.<sup><a href="#3xexpi">3</a></sup>
<br />
<br /><hr><small><sup><a name="1xepi">1</a></sup> The work he cites is on the adding the logic programming notions of unbound variables to functional programming languages, which (while I haven't looked at them) certainly don't look they would give a good introduction to the simple-but-goofy logic programming intuitions behind difference lists.
<br />
<br />That said, I basically have never seen a good, clear, self-contained description of what a difference list is - I consider Frank's <a href="http://www.cs.cmu.edu/~fp/courses/lp/lectures/11-diff.pdf">logic programming notes</a> to be pretty good, but I recognize that I'm not a fair judge because I was also present at the associated lecture.
<br />
<br /><sup><a name="2xexpi">2</a></sup> Grep for "∇", or "nominal", especially in anything prior to my thesis proposal, if you want a head start.
<br />
<br /><sup><a name="3xexpi">3</a></sup> I wonder what brackets they'll use, since as of the current standard they <a href="http://twitter.com/#!/mattmight/status/103592582675832833">seem to have run out</a>.</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-91440937892509284962011-08-22T14:34:00.000-04:002011-08-22T22:39:20.464-04:00Holey Data, Part 3/3: The Type Of A One-Hole ContextLet's review:<ul><li>As a prelude, I introduced the <a href="http://requestforlogic.blogspot.com/2011/08/embracing-and-extending-levy-language.html">Levy# language</a>, Bauer and Pretnar's implementation of call-by-push-value, a programming formalism that is very persnickety about the difference between <i>value code</i> and <i>computation code</i>, extended with datatypes. I have come to the position, especially after a conversation with Dan Licata, that we should really think of Levy# as an <a href="http://en.wikipedia.org/wiki/Administrative_normal_form">A-normal form</a> intermediate language that is explicit about control: using it allowed me to avoid certain red herrings that would have arisen in an ML-like syntax, but you'd really want the ML-like language to actually program in.</li><li>In <a href="http://requestforlogic.blogspot.com/2011/08/holey-data-part-13-almost-difference.html">Part 1</a>, I showed how representational linear functions could capture a logic programming idiom, <i>difference lists</i>. Difference lists can be implemented in ML/Haskell as functional data structures, but they don't have O(1) append and turn-into-a-regular-list operations; my Levy# difference lists, represented as values of type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt>, can perform all these operations in constant time with local pointer manipulation. The cost is that operations on linear functions are implemented by destructive pointer mutation - I should really have an <a href="http://requestforlogic.blogspot.com/2011/03/request-for-typestate-part-2-linear.html">affine type system</a> to ensure that values of type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt> are used at most once.</li><li>In <a href="">Part 2</a>, I observed that the perfectly natural, λProlog/Twelf-inspired operation of matching against these linear functions greatly increased their expressive power. By adding an extra bit of information to the runtime representation of data (an answer to the the question "where is the hole, if there is a hole?"), the matching operation could be done in constant time (well, time linear in the number of cases). A simple subordination analysis meant that checking for exhaustiveness and redundancy in case analysis was possible as well.</li></ul>In the case study in Part 2, I motivated another kind of pattern matching, one foreign to the λProlog/Twelf way of looking at the world but one intimately connected to the famous functional data structure known as <a href="http://en.wikipedia.org/wiki/Zipper_(data_structure)">The Zipper</a>. In this last installment, I show how linear representational functions can <i>completely subsume</i> zippers.
<br />
<br /><h2>The essence of zippers</h2>A zipper is a functional data structure that allows you to poke around on the inside of a datatype without having to constantly descend all the way int the datatype. Illustrations of zippers generally look something like this:<sup><a href="foot1x3">1</a></sup>
<br />
<br /><img src="http://typesafety.net/rfl/One-hole-context-plug.png" />
<br />
<br />The zipper data structure consists of two parts: one of them is a "path" or "one-hole-context", which, owing to Conor's influence [<a href="#1x3">1</a>], I will call a <i>derivative</i>, and the other is a subterm. We maintain the invariant that as we move into the derivative ("up") or into the subterm ("down") with constant-time operations, the combination of the derivative and the subterm remains the original term that we are editing. A Levy# implementation of derivative-based zippers as presented in Huet's original paper on zippers [<a href="#2x3">2</a>] can be found in <a href="https://github.com/robsimmons/levy/blob/holey-blog-3/thezipper.levy">thezipper.levy</a>.
<br />
<br />To understand linear function-based zippers, we see that a linear term <tt><span style="color:#0000ff; font-weight:bold">a -o b</span></tt> is kind of obviously a <tt><span style="color:#0000ff; font-weight:bold">b</span></tt> value with one <tt><span style="color:#0000ff; font-weight:bold">a</span></tt> value missing, so if we pair a value <tt><span style="color:#0000ff; font-weight:bold">a -o b</span></tt> with a value of type <tt><span style="color:#0000ff; font-weight:bold">a</span></tt>, we would hope to be able to use the pair as a zipper over <tt><span style="color:#0000ff; font-weight:bold">b</span></tt> values. The one thing that we can't do with the linear functions we've discussed so far, however, is look "up" in the tree.
<br />
<br />A list is a good simple example here, since the two parts of a linear function-based zipper over a list are a difference list (the beginning of the list) and a list (the end of the list). Looking "up" involves asking what the last item in the <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt> is (if there is at least one item). Since a value of type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt> has the structure <tt><span style="color:#0000ff; font-weight:bold">[hole: list] Cons n<sub>1</sub> (Cons n<sub>2</sub> ... (Cons n<sub>k-1</sub> (Cons n<sub>k</sub> hole))...)</span></tt>, it's reasonable to match this term against the pattern <tt><span style="color:#009900; font-weight:bold">[hole: list] <span style="color:#0000ff">outside</span> (Cons <span style="color:#0000ff">i</span> hole)</span></tt> to learn that <tt><span style="color:#0000ff; font-weight:bold">outside</span></tt> is <tt><span style="color:#0000ff; font-weight:bold">[hole: list] Cons n<sub>1</sub> (Cons n<sub>2</sub> ... (Cons n<sub>k-1</sub> hole)...)</span></tt> and that <tt><span style="color:#0000ff; font-weight:bold">i</span></tt> is <tt><span style="color:#0000ff; font-weight:bold">n<sub>k</sub></span></tt>. The key is that, in the pattern, the linear variable <tt><span style="color:#009900; font-weight:bold">hole</span></tt> appears as the <i>immediate</i> subterm to the constructor <tt><span style="color:#009900; font-weight:bold">Cons</span></tt>, so that we're really only asking a question about what the <i>closest</i> constructor to the hole is.
<br />
<br />This kind of interior pattern matching is quite different from existing efforts to understand "pattern fragments" of linear lambda cacluli (that I know of), but it's perfectly sensible. As I'll show in the rest of this post, it's also easy enough to modify the implementation to deal with interior pattern matching efficiently, easy enough to do exhaustiveness checking, and easy enough to actually use for programming. For the last part, I'll use an example from Michael Adams; you can also see the linear pattern matching version of Huet's original example on Github: (<a href="https://github.com/robsimmons/levy/blob/holey-blog-3/linzipper.levy">linzipper.levy</a>).
<br />
<br /><h2>Implementation</h2>The only trick to implementing zippers that allow interior pattern matching is to turn our linked list structure into a doubly linked list: the final data structure basically is a double-ended queue implemented by a doubly-linked list.
<br />
<br />First: we make our "tail pointer," which previously pointed into the last structure where the hole was, a pointer to the <i>beginning</i> of the last allocated cell, the one that immediately surrounds the hole. This lets us perform the pattern match, because we can see what the immediate context of the hole is just by following the tail pointer. (It also requires that we create a different representation of linear functions that are the identity, but that's the kind of boring observation that you can get by implementing a doubly-linked list in C.)
<br />
<br />After we have identified the immediate context of the hole, we also need a parent pointer to identify the smaller "front" of the linear function. The in-memory representation of <tt><span style="color:#0000ff; font-weight:bold">[hole: list] Cons 9 (Cons 3 (Cons 7 hole))</span></tt>, for example, will look something like this:
<br /><img src="http://typesafety.net/rfl/rjs-fig5.png"/>
<br />Just like the "where is the hole" information, this parent pointer gets to be completely ignored by the runtime if it hangs around after the linear function is turned into a non-holey value.
<br />
<br /><h2>Exhaustiveness checking</h2>Providing non-exhaustive match warnings for case analysis on the inside of a linear function is a bit tricker than providing non-exhaustive match warnings for case analysis on the outside of a linear function, but it's only just a bit trickier. First, observe that the subordination relation I talked about in the previous installment is the transitive closure of an <i>immediate subordination</i> relation that declares what types of values can be immediate subterms to constructors of other types. For instance, when we declare the type of the constructor <tt><span style="color:#009900; font-weight:bold">Cons</span></tt> to be <tt><span style="color:#0000ff; font-weight:bold">int -o list -o list</span></tt>, Levy# learns that <tt><span style="color:#0000ff; font-weight:bold">int</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">list</span></tt> are both immediately subordinate to <tt><span style="color:#0000ff; font-weight:bold">list</span></tt>.
<br />
<br />Levy# was already tracking both the immediate subordination relation and the full subordination relation; we can change the output of the "print subordination" command to reflect this reality:<pre> <span style="color:#000000; font-weight:bold">Levy. Press Ctrl-D to exit.<br/> Levy> <span style="color:#009900">data Z: <span style="color:#0000ff">even</span> | EO: <span style="color:#0000ff">even -o odd</span> | OE: <span style="color:#0000ff">odd -o even</span></span> ;;<br/> <span style="color:#009900">data Z: <span style="color:#0000ff">even</span><br/> | EO: <span style="color:#0000ff">even -o odd</span><br/> | OE: <span style="color:#0000ff">odd -o even</span></span><br/> Levy> $subord ;;<br/> Subordination for current datatypes:<br/> <span style="color:#0000ff">even</span> <| <span style="color:#0000ff">even</span><br/> <span style="color:#0000ff">even</span> <| <span style="color:#0000ff">odd</span> (immediate)<br/> <span style="color:#0000ff">odd</span> <| <span style="color:#0000ff">even</span> (immediate)<br/> <span style="color:#0000ff">odd</span> <| <span style="color:#0000ff">odd</span></span></pre>Now, the way we enumerate the possible cases for an interior pattern match against a linear value of type <tt><span style="color:#0000ff; font-weight:bold">holeTy -o valueTy</span></tt> is the following:<ul><li>Enumerate all of the types <tt><span style="color:#0000ff; font-weight:bold">ctxTy</span></tt> that <tt><span style="color:#0000ff; font-weight:bold">holeTy</span></tt> is immediately subordinate to. These are all the types with constructors that might immediately surround the hole (which, of course, has type <tt><span style="color:#0000ff; font-weight:bold">holeTy</span></tt>).</li><li>Filter out all the types <tt><span style="color:#0000ff; font-weight:bold">ctxTy</span></tt> above that aren't subordinate to, or the same as, the value type <tt><span style="color:#0000ff; font-weight:bold">valueTy</span></tt>.</li><li>For every constructor of the remaining types <tt><span style="color:#0000ff; font-weight:bold">ctxTy</span></tt>, list the positions where a value of type <tt><span style="color:#0000ff; font-weight:bold">holeTy</span></tt> may appear as an immediate subterm.</li></ul>Using the immediate subordination relation is actually unnecessary: alternatively, we could just start the filtering step with <i>all</i> types subordinate (or equal) to <tt><span style="color:#0000ff; font-weight:bold">valueTy</span></tt>. I think the use of the immediate subordination relation is kind of neat, though.
<br />
<br /><h2>Case Study: No, Really, Scrap Your Zippers</h2>When I said that this idea had been kicking around in my head since shortly after <a href="http://osl.iu.edu/wgp2010/">WGP 2010</a>, I meant specifically since the part of WGP 2010 when <a href="http://www.cs.indiana.edu/~adamsmd/">Michael Adams</a> presented the paper "<a href="http://www.cs.indiana.edu/~adamsmd/papers/scrap_your_zippers/">Scrap Your Zippers</a>". One of the arguments behind the Scrap Your Zippers approach is that Huet's zippers involve a lot of boilerplate and don't work very well for heterogenerous datatypes.
<br />
<br />Small examples tend to minimize the annoyingness of boilerplate by virtue of their smallness, but the example Adams gave about heterogeneous zippers works (almost) beautifully in our setting. First, we describe the datatype of departments, which are pairs of an employee record (the boss) and a list of employees (subordinates):<pre> <span style="color:#000000; font-weight:bold"><span style="color:#009900">data E: <span style="color:#0000ff">name -o int -o employee</span></span> ;;<br/><br/> <span style="color:#009900">data Nil: <span style="color:#0000ff">list</span><br/> | Cons: <span style="color:#0000ff">employee -o list -o list</span></span> ;;<br/><br/> <span style="color:#009900">data D: <span style="color:#0000ff">employee -o list -o dept</span></span> ;;<br/><br/> val <span style="color:#0000ff">agamemnon</span> = <span style="color:#0000ff">E Agamemnon 5000</span> ;;<br/> val <span style="color:#0000ff">menelaus</span> = <span style="color:#0000ff">E Menelaus 3000</span> ;;<br/> val <span style="color:#0000ff">achilles</span> = <span style="color:#0000ff">E Achilles 2000</span> ;;<br/> val <span style="color:#0000ff">odysseus</span> = <span style="color:#0000ff">E Odysseus 2000</span> ;;<br/> val <span style="color:#0000ff">dept</span> = <span style="color:#0000ff">D agamemnon <br/> (Cons menelaus <br/> (Cons achilles <br/> (Cons odysseus Nil)))</span> ;;<span></pre>Our goal is to zipper our way in to the department value <tt><span style="color:#0000ff; font-weight:bold">dept</span></tt> and rename "Agamemnon" "King Agamemnon."
<br />
<br />The reason this is only <i>almost</i> beautiful is that Levy# doesn't have a generic pair type, and the type of zippers over a heterogeneous datatype like <tt><span style="color:#0000ff; font-weight:bold">dept</span></tt> is a pair of a linear representational function <tt><span style="color:#0000ff; font-weight:bold">T -o dept</span></tt> and a value <tt><span style="color:#0000ff; font-weight:bold">T</span></tt> for some type <tt><span style="color:#0000ff; font-weight:bold">T</span></tt>. So, we need to come up with specific instances of this as datatypes, the way we might in Twelf:<pre> <span style="color:#000000; font-weight:bold"><span style="color:#009900">data PD: <span style="color:#0000ff">(dept -o dept) -o dept -o paird</span></span> ;;<br/> <span style="color:#009900">data PE: <span style="color:#0000ff">(employee -o dept) -o employee -o paire</span></span> ;;<br/> <span style="color:#009900">data PN: <span style="color:#0000ff">(name -o dept) -o name -o pairn</span></span> ;;
<br /></span></pre>Given these pair types, we are able to descend into the structure without any difficulty:<pre><span style="color:#000000; font-weight:bold"> # Create the zipper pair<br/> val <span style="color:#0000ff">loc</span> = <span style="color:#0000ff">PD ([hole: dept] hole) dept</span> ;;<br/><br/> # Move down and to the left<br/> comp <span style="color:#0000ff">loc</span> = <br/> <span style="color:#cc4020">let <span style="color:#009900">(PD <span style="color:#0000ff">path</span> <span style="color:#0000ff">dept</span>)</span> be <span style="color:#0000ff">loc</span> in<br/> let <span style="color:#009900">(D <span style="color:#0000ff">boss</span> <span style="color:#0000ff">subord</span>)</span> be <span style="color:#0000ff">dept</span> in<br/> return <span style="color:#0000ff">PE<br/> ([hole: employee] path (D hole subord))<br/> boss</span></span> ;;<br/><br/> # Move down and to the left<br/> comp <span style="color:#0000ff">loc</span> = <br/> <span style="color:#cc4020">let <span style="color:#009900">(PE <span style="color:#0000ff">path</span> <span style="color:#0000ff">employee</span>)</span> be <span style="color:#0000ff">loc</span> in<br/> let <span style="color:#009900">(E <span style="color:#0000ff">name</span> <span style="color:#0000ff">salary</span>)</span> be <span style="color:#0000ff">employee</span> in<br/> return <span style="color:#0000ff">PN<br/> ([hole: name] path (E hole salary))<br/> name</span></span> ;;</span></pre>At this point, we <i>could</i> just do the replacement in constant time by linear application:<pre><span style="color:#000000; font-weight:bold"> comp <span style="color:#0000ff">revision</span> = <br/> <span style="color:#cc4020">let <span style="color:#009900">(PN <span style="color:#0000ff">path</span> <span style="color:#0000ff">name</span>)</span> be <span style="color:#0000ff">loc</span> in<br/> return <span style="color:#0000ff">path KingAgamemnon</span></span> ;;</span></pre>For comparison, the <i>Scrap Your Zippers</i> framework implements this with the function <tt>fromZipper</tt>.
<br />
<br />However, everything above could have been done in the previous installment. Our new kind of pattern matching reveals its power when we try to walk "up" in the data structure, so we'll do that instead. The first step takes us back to an <tt><span style="color:#0000ff; font-weight:bold">employee -o dept</span></tt> zipper, and is where we give Agamemnon his kingly designation:<pre><span style="color:#000000; font-weight:bold"> comp <span style="color:#0000ff">loc</span> = <br/> <span style="color:#cc4020">let <span style="color:#009900">(PN <span style="color:#0000ff">path</span> <span style="color:#0000ff">name</span>)</span> be <span style="color:#0000ff">loc</span> in<br/> let <span style="color:#009900">([hole: <span style="color:#0000ff">name</span>] <span style="color:#0000ff">path</span> (E hole <span style="color:#0000ff">salary</span>))</span> be <span style="color:#0000ff">path</span> in<br/> return <span style="color:#0000ff">PE path (E KingAgamemnon salary)</span></span> ;;</span></pre>The first step was easy: the only place a <tt><span style="color:#0000ff; font-weight:bold">name</span></tt> hole can appear in an <tt><span style="color:#0000ff; font-weight:bold">dept</span></tt> datatype is inside of an <tt><span style="color:#0000ff; font-weight:bold">employee</span></tt>. An <tt><span style="color:#0000ff; font-weight:bold">employee</span></tt>, however, can appear in a value of type <tt><span style="color:#0000ff; font-weight:bold">dept</span></tt> in one of two paces: either as the boss or as one of the members of the list of subordinates. Therefore, if we want to avoid nonexhaustive match warnings, we have to give an extra case:<sup><a href="#foot2x3">2</a></sup><pre><span style="color:#000000; font-weight:bold"> comp <span style="color:#0000ff">revision</span> = <br/> <span style="color:#cc4020">let <span style="color:#009900">(PE <span style="color:#0000ff">path</span> <span style="color:#0000ff">employee</span>)</span> be <span style="color:#0000ff">loc</span> in<br/> match <span style="color:#0000ff">path</span> with<br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">employee</span>] D hole <span style="color:#0000ff">subord</span></span> -> <br/> return <span style="color:#0000ff">D employee subord</span><br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">employee</span>] <span style="color:#0000ff">path</span> (Cons hole <span style="color:#0000ff">other_subord</span>)</span> -><br/> return <span style="color:#0000ff">dept</span></span> ;; # Error?</span></pre>Note that, in the first match above, the case was <tt><span style="color:#009900; font-weight:bold">[hole: <span style="color:#0000ff">employee</span>] D hole <span style="color:#0000ff">subord</span></span></tt>, not <tt><span style="color:#009900; font-weight:bold">[hole: <span style="color:#0000ff">employee</span>] <span style="color:#0000ff">path</span> (D hole <span style="color:#0000ff">subord</span>)</span></tt>. As in the previous installment, I used the subordination relation to conclude that there were no values of type <tt><span style="color:#0000ff; font-weight:bold">dept -o dept</span></tt> other than the identity; therefore, it's okay to allow the simpler pattern that assumes <tt><span style="color:#0000ff; font-weight:bold">path</span></tt> is the identity.
<br />
<br />The code for this example is in <a href="https://github.com/robsimmons/levy/blob/holey-blog-3/scrap.levy">scrap.levy</a>.
<br />
<br /><h2>Conclusion</h2>Proposals for radically new language features should be treated with some care; do they really add enough expressiveness to the language? I hope that this series has at least suggested the possibility that linear function types might be desirable as a core language feature in a functional language. Linear representational functions expand the class of safe, pure algorithms to capture algorithms that could previously only be done in an impure way, and they give a completely principled (and cast-free) way of scrapping your zipper boilerplate.
<br />
<br />And perhaps the most important feature of this proposal is one I haven't touched so far, which is the added simplicity of reasoning about programs, both informally and formally. As for the "informally," if you've ever been flummoxed by the process of making sure that your heavily tail-recursive program has an even number of <tt>List.rev</tt> errors, or if you have avoided making functions tail-recursive for precisely this reason, I think linear representational functions could be a huge help. One thing I'd like to do if I have time is to work through <a href="http://requestforlogic.blogspot.com/2010/10/type-inference-in-and-out-of-context.html">type inference in context</a> with linear representational functions; I imagine many things would be much clearer. In particular, I suspect there wouldn't be any need for both Cons and Snoc lists or the "fish" operator.
<br />
<br />The formal reasoning aspect can be glimpsed by looking at the previous case study: proving that the tail-recursive functions <tt><span style="color:#0000ff; font-weight:bold">lin_of_zip</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">zip_of_lin</span></tt> are inverses has the structure of a double-reverse theorem (proving that <tt>List.rev (List.rev xs) = xs</tt>). Maybe it's just me being dense, but I have trouble with double-reverse theorems. On the other hand, if we needed to prove that the structurally-inductive and not tail-recursive functions <tt><span style="color:#0000ff; font-weight:bold">lin_of_zip'</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">zip_of_lin'</span></tt> (which we can <a href="https://github.com/robsimmons/levy/blob/holey-blog-3/linear-vs-zipper.levy#L23-32">now implement</a>, of course) are inverses, that's just a straightforward induction and call to the induction hypothesis. And making dumb theorems easier to prove is, in my experience at least half the battle of using theorem provers.
<br />
<br /><h3>Expressiveness and efficiency, reloaded</h3>The claim that linear representational functions can completely subsume zippers is undermined somewhat by the emphasis I have put on making matching and other operations O(1). Just to be clear: I can entirely implement linear representational functions using functional data structures (in fact, by using derivatives!) if I'm not concerned with performance. There's even a potential happy medium: rather than invalidating a zipper, I <i>think</i> it would be possible to merely mark a zipper as "in use," so that the second time you use a zipper it gets silently copied. This means that if you think about affine usage, you get constant-time guarantees, but if you just want to use more-awesome zippers, you can program as if it was a persistent data structure. This "persistent always, efficient if you use it right" notion of persistant data structures has precedent, it's how <a href="http://dl.acm.org/citation.cfm?id=1292541">persistent union-find</a> works.
<br />
<br /><hr><small><a name="foot1x3"></a><sup>1</sup> That particular image is a CC-SA licensed contribution by Heinrich Apfelmus on the <a href="http://en.wikibooks.org/wiki/Haskell/Zippers">Haskell Wikibook on zippers</a>. Thanks, Heinrich!
<br /><a name="foot2x3"></a><sup>2</sup> If we had a polymorphic option type we could just return <tt>Some</tt> or <tt>None</tt>, of course. Meh.
<br />[<a name="1x3"></a>1] Conor McBride, "<a href="http://strictlypositive.org/diff.pdf">The Derivative of a Regular Type is its Type of One-Hole Contexts</a>," comically unpublished.
<br />[<a name="2x3"></a>2] Gúrard Huet, "<a href="http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=44121">Function Pearl: The Zipper</a>," JFP 1997.
<br />[<a name="3x3"></a>3] Michael D. Adams "<a href="https://www.cs.indiana.edu/~adamsmd/papers/scrap_your_zippers/">Scrap Your Zippers</a>," WGP 2010.
<br /></small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com2tag:blogger.com,1999:blog-827419514217130218.post-9858364988776847172011-08-15T23:47:00.000-04:002011-08-22T10:39:13.292-04:00Holey Data, Part 2/3: Case Analysis on Linear FunctionsThe <a href="http://requestforlogic.blogspot.com/2011/08/embracing-and-extending-levy-language.html">previous installment</a> was titled "(Almost) Difference Lists" because there was one operation that we could (technically) do on a Prolog difference list that we can't do on the difference lists described in the previous section. If a Prolog difference list is <i>known to be nonempty</i>, it's possible to match against the front of the list. Noone ever does this that I can tell, because if the Prolog difference list <i>is</i> empty, this will mess up all the invariants of the difference list. Still, however, it's there.
<br />
<br />We will modify our language to allow <i>pattern matching</i> on linear functions, which is like pattern matching on a difference list but better, because we can safely handle emptiness. The immediate application of this is that we have a list-like structure that allows constant time affix-to-the-end and remove-from-the-beginning operations: a queue! Due to the similarity with difference lists, I'll call this curious new beast a <i>difference queue</i>. This is all rather straightforward, except for the dicussion of coverage checking, which involves a well-understood analysis called <i>subordination</i>. But we'll cross that bridge when we come to it.
<br />
<br /><b>A pattern matching aside.</b> Pattern matching against functions? It's certainly the case that in ML we can't pattern match against functions, but as we've already discussed towards the end of the <a href="http://requestforlogic.blogspot.com/2011/08/embracing-and-extending-levy-language.html">intro to Levy#</a>, the linear function space is not a <i>computational</i> function space like the one in ML, it's a <i>representational</i> function space like the one in LF/Twelf, a distinction that comes from Dan Licata, Bob Harper, and Noam Zeilberger [<a href="#1x2">1</a>, <a href="#2x2">2</a>]. And we pattern match against representational functions all the time in LF/Twelf. Taking this kind of pattern matching from the logic programming world of Twelf into the functional programming world is famously tricky (leading to proposals like <a href="http://www.cs.mcgill.ca/~complogic/beluga/">Beluga</a>, <a href="http://cs-www.cs.yale.edu/homes/delphin/">Delphin</a>, (sort of) <a href="http://abella.cs.umn.edu/">Abella</a>, and the aforementioned efforts of Licata et al.), but the trickiness is always from attempts to open up a (representational) function, do something computational on the inside, and then put it back together. We're not going to need to do anything like that, luckily for us.
<br />
<br /><h2>Difference queues</h2>We're going to implement difference queues as values of type <tt><span style="color:#0000ff; font-weight:bold">q -o q</span></tt>, where <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> is an interesting type: because definitions are inductive, the type <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> actually has no inhabitants. <pre> <span style="color:#000000; font-weight:bold"><span style="color:#009900">data QCons: <span style="color:#0000ff">int -o q -o q</span></span> ;;</span></pre>An alternative would be to implement difference queues using the difference lists <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt> from the previous installment, which would work fine too.
<br />
<br />We'll also have an option type, since de-queueing might return nothing if the queue is empty, as well as a "valof" equivalent operation to force the queue from something that may or may not be a queue. This <tt><span style="color:#0000ff; font-weight:bold">getq</span></tt> option will raise a non-exhaustive match warning during coverage checking, since it can obviously raise a runtime error.<pre><span style="color:#000000; font-weight:bold"> <span style="color:#009900">data None: <span style="color:#0000ff">option</span><br/> | Some: <span style="color:#0000ff">int -o (q -o q) -o option</span></span> ;;<br/><br/> val <span style="color:#0000ff">getq</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">fun <span style="color:#0000ff">opt</span>: <span style="color:#0000ff">option</span> -><br/> let (<span style="color:#009900">Some <span style="color:#0000ff">_</span> <span style="color:#0000ff">q</span></span>) be <span style="color:#0000ff">opt</span> in return <span style="color:#0000ff">q</span></span></span> ;;</span></pre>The operations to make a new queue and to push an item onto the end of the queue use the functionality that we've already presented:<pre><span style="color:#000000; font-weight:bold"> val <span style="color:#0000ff">new</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">return <span style="color:#0000ff">[x:q] x</span></span></span> ;;<br/><br/> val <span style="color:#0000ff">push</span> = <span style="color:#0000ff">thunk<br/> <span style="color:#cc4020">fun <span style="color:#0000ff">i</span>: <span style="color:#0000ff">int</span> -><br/> fun <span style="color:#0000ff">queue</span>: <span style="color:#0000ff">(q -o q)</span> -><br/> return <span style="color:#0000ff">[x:q] queue (QCons i x)</span></span></span> ;;</span></pre>The new functionality comes from the <tt><span style="color:#0000ff; font-weight:bold">pop</span></tt> function, which matches against the front of the list. An empty queue is represented by the identity function.<pre><span style="color:#000000; font-weight:bold"> val <span style="color:#0000ff">pop</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">fun <span style="color:#0000ff">queue</span>: <span style="color:#0000ff">(q -o q)</span> -><br/> match <span style="color:#0000ff">queue</span> with<br/> | <span style="color:#009900">[x:<span style="color:#0000ff">q</span>] x</span> -> return <span style="color:#0000ff">None</span><br/> | <span style="color:#009900">[x:<span style="color:#0000ff">q</span>] QCons <span style="color:#0000ff">i</span> (<span style="color:#0000ff">queue'</span> x)</span> -> return <span style="color:#0000ff">Some i queue'</span></span></span> ;;</span></pre>Lets take a closer look at the second pattern, <tt><span style="color:#009900; font-weight:bold">[x:q] QCons <span style="color:#0000ff">i</span> (<span style="color:#0000ff">queue'</span> x)</span></tt>. The <tt><span style="color:#009900; font-weight:bold">QCons</span></tt> constructor has two arguments, and because the linear variable <tt><span style="color:#009900; font-weight:bold">x</span></tt> occurs exactly once, it has to appear in one of the two arguments. This pattern is intended to match the case where the linear variable <tt><span style="color:#009900; font-weight:bold">x</span></tt> appears in the second argument of the constructor (read: inside of the the <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> part, not inside of the <tt><span style="color:#0000ff; font-weight:bold">int</span></tt> part), and the pattern binds a linear function <tt><span style="color:#0000ff; font-weight:bold">queue'</span></tt> that has type <tt><span style="color:#0000ff; font-weight:bold">q -o q</span></tt>.
<br />
<br />You can see how these difference queues are used in <a href="https://github.com/robsimmons/levy/blob/holey-blog-2/linearqueue.levy#L26-43">linearqueue.levy</a>.
<br />
<br /><b>Another pattern matching aside.</b> The above-mentioned patterns (and, in fact, all accepted patterns in this current extension of Levy#) actually come from the set of Linear LF terms that are in what is known as the <i>pattern fragment</i> (appropriately enough). The pattern fragment was first identified by Dale Miller as a way of carving out a set of unification problems on higher-order terms that could 1) always be given unitary and decidable solutions and 2) capture many of the actual unification problems that might arise in λProlog [<a href="#3x2">3</a>]. Anders Schack-Nielsen and Carsten Schürmann later generalized this to Linear LF [<a href="#4x2">4</a>], which as I've described is the language that we're essentially using to describe our data.
<br />
<br /><h2>Coverage checking with subordination</h2>In the previous section we saw that the two patterns <tt><span style="color:#009900; font-weight:bold">[x:q] x</span></tt> and <tt><span style="color:#009900; font-weight:bold">[x:q] QCons <span style="color:#0000ff">i</span> (<span style="color:#0000ff">queue'</span> x)</span></tt> were used to match against a value of type <tt><span style="color:#0000ff; font-weight:bold">q -o q</span></tt>, and the coverage checker in Levy# accepted those two patterns as providing a complete case analysis of values of this type. But the constructor <tt><span style="color:#009900; font-weight:bold">QCons</span></tt> has two arguments; why was the coverage checker satisfied with a case analysis that only considered the linear variable occurring in the second argument?
<br />
<br />To understand this, consider the pattern <tt><span style="color:#009900; font-weight:bold">[x:q] x</span></tt> and <tt><span style="color:#009900; font-weight:bold">[x:q] QCons (<span style="color:#0000ff">di</span> x) <span style="color:#0000ff">queue'</span></span></tt>, where the linear variable does occur in the first argument. This pattern binds the variable <tt><span style="color:#0000ff; font-weight:bold">di</span></tt>, a linear function value of type <tt><span style="color:#0000ff; font-weight:bold">q -o int</span></tt>. But the only inhabitants of type <tt><span style="color:#0000ff; font-weight:bold">int</span></tt> are constants, and the <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> must go somewhere; where can it go? It can't go anywhere! This effectively means that there are <i>no closed values of type</i> <tt><span style="color:#0000ff; font-weight:bold">q -o int</span></tt>, so there's no need to consider what happens if the <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> hole appears inside of the first argument to <tt><span style="color:#009900; font-weight:bold">QCons</span></tt>.
<br />
<br />Because of these considerations, Levy# has to calculate what is called the <a href="http://twelf.plparty.org/wiki/Subordination">subordination relation</a> for the declared datatypes. Subordination is an analysis developed for Twelf that figures out what types of terms can appear as subterms of other types of terms. I added a new keyword to Levy# for reporting this subordination information:<pre> <span style="color:#000000; font-weight:bold">Levy> $subord ;;<br/> Subordination for current datatypes:<br/> <span style="color:#0000ff">int</span> <| <span style="color:#0000ff">q</span><br/> <span style="color:#0000ff">q</span> <| <span style="color:#0000ff">q</span><br/> <span style="color:#0000ff">int</span> <| <span style="color:#0000ff">option</span></span></pre>So <tt><span style="color:#0000ff; font-weight:bold">int</span></tt> is subordinate to both <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">option</span></tt>, and <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> is subordinate only to itself. Subordination is intended to be a conservative analysis, so this means that there <i>might</i> be values of type <tt><span style="color:#0000ff; font-weight:bold">int -o q</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">int -o option</span></tt> and that there <i>might</i> be non-identity values of type <tt><span style="color:#0000ff; font-weight:bold">q -o q</span></tt>, but that there are <i>no</i> values of type <tt><span style="color:#0000ff; font-weight:bold">q -o option</span></tt> and the only value of type <tt><span style="color:#0000ff; font-weight:bold">option -o option</span></tt> is <tt><span style="color:#0000ff; font-weight:bold">[x: option] x</span></tt>. Levy# uses the no-overlapping-holes restriction to make subordination analysis more precise; without this restriction, a reasonable subordination analysis would likely declare <tt><span style="color:#0000ff; font-weight:bold">q</span></tt> subordinate to <tt><span style="color:#0000ff; font-weight:bold">option</span></tt>.<sup><a href="#foot1x2">1</a></sup>
<br />
<br />Some more examples of subordination interacting with coverage checking can be seen in <a href="https://github.com/robsimmons/levy/blob/holey-blog-2/linearmatch.levy#L22-36">linearmatch.levy</a>.
<br />
<br /><h3>Subordination and identity in case analysis</h3>We use subordination data for one other optimization. The following function is also from <a href="https://github.com/robsimmons/levy/blob/holey-blog-2/linearmatch.levy#L7-12">linearmatch.levy</a>; it takes a value of type <tt><span style="color:#0000ff; font-weight:bold">int -o list</span></tt> and discards everything until the place where the hole in the list was. <pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">run</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">rec <span style="color:#0000ff">run</span>: <span style="color:#0000ff">(int -o list)</span> -> F <span style="color:#0000ff">list</span> is<br/> fun <span style="color:#0000ff">x</span>: <span style="color:#0000ff">(int -o list)</span> -><br/> match <span style="color:#0000ff">x</span> with<br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">int</span>] Cons hole <span style="color:#0000ff">l</span></span> -> return <span style="color:#0000ff">l</span><br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">int</span>] Cons <span style="color:#0000ff">i</span> (<span style="color:#0000ff">dx</span> hole)</span> -> force <span style="color:#0000ff">run</span> <span style="color:#0000ff">dx</span></span></span> ;;</span></pre>Because Levy# is limited to depth-1 pattern matching, a pattern match should really only say that the hole is <i>somewhere in</i> a subterm, not that the hole is <i>exactly at</i> a subterm. This would indicate that the first pattern should really be <tt><span style="color:#009900; font-weight:bold">[hole: <span style="color:#0000ff">int</span>] Cons (<span style="color:#0000ff">di</span> hole) <span style="color:#0000ff">l</span></span></tt>, but by subordination analysis, we know that <tt><span style="color:#0000ff; font-weight:bold">int</span></tt> is not subordinate to <tt><span style="color:#0000ff; font-weight:bold">int</span></tt> and so therefore the only value of type <tt><span style="color:#0000ff; font-weight:bold">int -o int</span></tt> is the identity function, so <tt><span style="color:#0000ff; font-weight:bold">di = [hole:int] hole</span></tt> and we can beta-reduce <tt><span style="color:#0000ff; font-weight:bold">([hole:int] hole) <span style="color:#009900">hole</span></span></tt> to get just <tt><span style="color:#009900; font-weight:bold">hole</span></tt>.
<br />
<br />So subordination is a very helpful analysis for us; it allows us to avoid writing some patterns altogether (patterns that bind variables with types that aren't inhabited) and it lets us simplify other patterns by noticing that for certain types "the hole appears somewhere in this subterm" is exactly the same statement as "this subterm is exactly the hole."
<br />
<br /><h2>Implementation</h2>In order to efficiently pattern match against the beginning of a list, we need to be able to rapidly tell which sub-part of a data structure the hole can be found in. This wasn't a problem for difference lists and difference queues, since subordination analysis is enough to tell us where the hole will be if it exists, but consider trees defined as follows:<pre> <span style="color:#000000; font-weight:bold"><span style="color:#009900">data Leaf: <span style="color:#0000ff">tree</span><br/> | Node: <span style="color:#0000ff">tree -o tree -o tree</span></span> ;;</span></pre>If we match against a value of type <tt><span style="color:#0000ff; font-weight:bold">tree -o tree</span></tt>, we need to deal with the possibility that it is the identity function, the possibility that the hole is in the left subtree, and the possibility that the hole is in the right subtree. This means that, if we wish for matching to be a constant-time operation, we also need to be able to <i>detect</i> whether the hole is in the left or right subtree without doing a search of the whole tree.
<br />
<br />This is achieved by adding an extra optional field to the in-memory representation of structures, a number that indicates where the hole is. Jason Reed correctly pointed out in a <a href="http://requestforlogic.blogspot.com/2011/08/holey-data-part-13-almost-difference.html?showComment=1313337306965#c701274819557663771">comment</a> that for the language extension described in the previous installment, there was actually no real obstacle to having the runtime handle multiple overlapping holes and types like <tt><span style="color:#0000ff; font-weight:bold">tree -o (tree -o tree)</span></tt>. But due to the way we're modifying the data representation to do matching, the restriction to having at most one hole at a time is now critical: the runtime stores directions to <i>the</i> hole at every point in the structure.
<br />
<br />The memory representation produced by the value code <tt><span style="color:#0000ff; font-weight:bold">[hole: tree] Node (Node Leaf Leaf) (Node (Node hole Leaf) Leaf)</span></tt> might look something like this, where I represent the number indicating where the hole is by circling the indicated hole in red:
<br /><img src="http://typesafety.net/rfl/rjs-fig4.png" />
<br />If the hole is filled, the extra data (the red circles) will still exist, but the part of the runtime that does operations on normal inductively defined datatypes can just ignore the presence of this extra data. (In a full implementation of these ideas, this would likely complicate garbage collection somewhat.)
<br />
<br /><h2>Case Study: Holey Trees and Zippers</h2>The binary trees discussed before, augmented with integers at the leaves, are the topic of this case study. A famous data structure for functional generic programming is Huet's zipper [<a href="#5x2">5</a>, <a href="#6x2">6</a>], which describes inside-out paths through inductive types such as trees. The idea of a zipper is that it allows a programmer to place themselves <i>inside</i> a tree and move up, left-down, and right-down the tree using only constant-time operations based on pointer manipulation.
<br />
<br />The zipper for trees looks like this:<pre> <span style="color:#000000; font-weight:bold"><span style="color:#009900">data Top: <span style="color:#0000ff">path</span><br/> | Left: <span style="color:#0000ff">path -o tree -o path</span><br/> | Right: <span style="color:#0000ff">tree -o path -o path</span></span> ;;</span></pre>In this case study, we will show how to coerce linear functions <tt><span style="color:#0000ff; font-weight:bold">tree -o tree</span></tt> into the zipper data structure <tt><span style="color:#0000ff; font-weight:bold">path</span></tt> and vice versa.
<br />
<br />In order to go from a linear function <tt><span style="color:#0000ff; font-weight:bold">tree -o tree</span></tt> to a zipper <tt><span style="color:#0000ff; font-weight:bold">path</span></tt>, we use a function that takes two arguments, an "outside" <tt><span style="color:#0000ff; font-weight:bold">path</span></tt> and an "inside" <tt><span style="color:#0000ff; font-weight:bold">tree -o tree</span></tt>. As we descend into the tree-with-a-hole by pattern matching against the linear function, we tack the subtrees that aren't on the path to the hole onto the outside <tt><span style="color:#0000ff; font-weight:bold">path</span></tt>, so that in every recursive call the zipper gets bigger and the linear function gets smaller.<pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">zip_of_lin</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">rec <span style="color:#0000ff">enzip</span>: <span style="color:#0000ff">path</span> -> <span style="color:#0000ff">(tree -o tree)</span> -> F <span style="color:#0000ff">path</span> is <br/> fun <span style="color:#0000ff">outside</span>: <span style="color:#0000ff">path</span> -> <br/> fun <span style="color:#0000ff">inside</span>: <span style="color:#0000ff">(tree -o tree)</span> -> <br/> match <span style="color:#0000ff">inside</span> with<br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">tree</span>] hole</span> -> <br/> return <span style="color:#0000ff">outside</span><br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">tree</span>] Node (<span style="color:#0000ff">left</span> hole) <span style="color:#0000ff">right</span></span> -> <br/> force <span style="color:#0000ff">enzip</span> <span style="color:#0000ff">(Left outside right)</span> <span style="color:#0000ff">left</span><br/> | <span style="color:#009900">[hole: <span style="color:#0000ff">tree</span>] Node <span style="color:#0000ff">left</span> (<span style="color:#0000ff">right</span> hole)</span> -> <br/> force <span style="color:#0000ff">enzip</span> <span style="color:#0000ff">(Right left outside)</span> <span style="color:#0000ff">right</span></span></span> ;;</span></pre>Given this function, the obvious implementation of its inverse just does the opposite, shrinking the zipper with every recursive call and tacking the removed data onto the linear function:<pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">lin_of_zip</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">rec <span style="color:#0000ff">enlin</span>: <span style="color:#0000ff">path</span> -> <span style="color:#0000ff">(tree -o tree)</span> -> F <span style="color:#0000ff">(tree -o tree)</span> is <br/> fun <span style="color:#0000ff">outside</span>: <span style="color:#0000ff">path</span> -><br/> fun <span style="color:#0000ff">inside</span>: <span style="color:#0000ff">(tree -o tree)</span> -><br/> match <span style="color:#0000ff">outside</span> with<br/> | <span style="color:#009900">Top</span> -> <br/> return <span style="color:#0000ff">inside</span><br/> | <span style="color:#009900">Left <span style="color:#0000ff">path</span> <span style="color:#0000ff">right</span></span> -> <br/> force <span style="color:#0000ff">enlin</span> <span style="color:#0000ff">path</span> <span style="color:#0000ff">([hole: tree] Node (inside hole) right)</span><br/> | <span style="color:#009900">Right <span style="color:#0000ff">left</span> <span style="color:#0000ff">path</span></span> -> <br/> force <span style="color:#0000ff">enlin</span> <span style="color:#0000ff">path</span> <span style="color:#0000ff">([hole: tree] Node left (inside hole))</span></span></span> ;;</span></pre>That's the obvious implementation, where we tack things on to the outside of the linear function. Linear functions have the property, of course, that you can tack things on to the inside or the outside, which gives us the opportunity to consider another way of writing the inverse that looks more traditionally like an induction on the structure of the zipper:<pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">lin_of_zip'</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">rec <span style="color:#0000ff">enlin</span>: <span style="color:#0000ff">path</span> -> F <span style="color:#0000ff">(tree -o tree)</span> is<br/> fun <span style="color:#0000ff">path</span>: <span style="color:#0000ff">path</span> -> <br/> match <span style="color:#0000ff">path</span> with<br/> | <span style="color:#009900">Top</span> -> return <span style="color:#0000ff">[hole: tree] hole</span><br/> | <span style="color:#009900">Left <span style="color:#0000ff">path</span> <span style="color:#0000ff">right</span></span> -> <br/> force <span style="color:#0000ff">enlin</span> <span style="color:#0000ff">path</span> to <span style="color:#0000ff">lin</span> in<br/> return <span style="color:#0000ff">[hole: tree] lin (Node hole right)</span><br/> | <span style="color:#009900">Right <span style="color:#0000ff">left</span> <span style="color:#0000ff">path</span></span> -> <br/> force <span style="color:#0000ff">enlin</span> <span style="color:#0000ff">path</span> to <span style="color:#0000ff">lin</span> in<br/> return <span style="color:#0000ff">[hole: tree] lin (Node left hole)</span></span></span> ;;</span></pre>These functions, and examples of their usage, can be found in <a href="https://github.com/robsimmons/levy/blob/holey-blog-2/linear-vs-zipper.levy">linear-vs-zipper.levy</a>.
<br />
<br /><h3>Foreshadowing</h3>This installment was written quickly after the first one; I imagine there will be a bigger gap before the third installment, so I'm going to go ahead and say a bit about where I'm going with this, using the case study as motivation.
<br />
<br />I wrote three functions: <ul><li><tt><span style="color:#0000ff; font-weight:bold">lin_of_zip</span></tt> turns zippers into linear functions by case analyzing the zipper and tacking stuff onto the "beginning" our outside of the linear function,</li><li><tt><span style="color:#0000ff; font-weight:bold">lin_of_zip'</span></tt> turns zippers into linear functions by inducting over the path and tacking stuff onto the "end" or inside of the linear function, and</li><li><tt><span style="color:#0000ff; font-weight:bold">zip_of_lin</span></tt> turns linear functions into zippers by case analyzing the "beginning" or outside of the linear function and tacking stuff on to the zipper.</li></ul>What about <tt><span style="color:#0000ff; font-weight:bold">zip_of_lin'</span></tt>, which turns linear functions into zippers by case analyzing the "end" or inside of the linear function? It's easy enough to describe what this function would look like:<pre><span style="color:#000000; font-weight:bold"># val zip_of_lin' = thunk rec enzip: (tree -o tree) -> F path is <br/># fun lin: (tree -o tree) -> <br/># match lin with<br/># | [hole: tree] hole -> return Top<br/># | [hole: tree] lin' (Node hole right) -> <br/># force enzip lin' to path in<br/># return Left path right<br/># | [hole: tree] lin' (Node left hole) -><br/># force enzip lin' to path in<br/># return Right left path ;;</span></pre>Toto, we're not in the pattern fragment anymore, but if we turn the representation of linear functions into <i>doubly linked lists</i> (or a double-ended-queues implemented as linked lists, perhaps), I believe we can implement these functions without trouble. At that point, we basically don't need the zippers anymore: instead of declaring that <a href="http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=the+derivative+of+a+type+is+the+type+of+its+one+hole+contexts">the derivative of a type is the type of its one-hole contexts</a>, we can make the obvious statement that the linear function from a type to itself is the type of one-hole contexts of that type, and we can program accordingly: no new boilerplate datatype declarations needed!
<br />
<br /><h2>Conclusion</h2>A relatively simple modification of the runtime from the previous installment, a runtime data tag telling us where the hole is, allows us to efficiently pattern match against linear representational functions. This modification makes the use of linear representational functions far more general than just a tool for efficiently implementing a logic programming idiom of difference lists. In fact, I hope the case study gives a convincing case that these holey data structures can come close to capturing many of the idioms of <i>generic programming</i>, though that argument won't be fully developed until the third installment, where we move beyond patterns that come from the Miller/Schack-Nielsen/Schürmann pattern fragment.
<br />
<br />More broadly, we have given a purely logical and declarative type system that can implement algorithms that would generally be characterized as imperative algorithms, not functional algorithms. Is it fair to call the queue representation a "functional data structure"? It is quite literally a data structure that is a function, after all! If it is (and I'm not sure it is), this would seem to challenge at least my personal understanding of what "<a href="http://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki">functional data structures</a>" and functional algorithms are in the first place.
<br />
<br /><hr /><small>
<br /><sup><a name="foot1x2"></a>1</sup> This is true even though there are, in fact, no closed values of type <tt><span style="color:#0000ff; font-weight:bold">q -o option</span></tt> even if we don't have the no-overlapping-holes restriction (proof left as an exercise for the reader).
<br />[<a name="1x2"></a>1] Daniel R. Licata, Noam Zeilberger, and Robert Harper, "<a href="http://www.cs.cmu.edu/~drl/pubs/lzh08focbind/lzh08focbind.pdf">Focusing on Binding and Computation</a>," LICS 2008.
<br />[<a name="2x2"></a>2] Daniel R. Licata and Robert Harper, "<a href="http://www.cs.cmu.edu/~drl/pubs/lh09unibind/lh09unibind.pdf">A Universe of Binding and Computation</a>," ICFP 2009.
<br />[<a name="3x2"></a>3] Dale Miller, "<a href="http://www.lix.polytechnique.fr/Labo/Dale.Miller/papers/jlc91.pdf">A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification</a>," JLC 1(4), 1991.
<br />[<a name="4x2"></a>4] Anders Schack-Nielsen and Carsten Schürmann, "<a href="http://www.itu.dk/people/anderssn/pat-uni-art.pdf">Pattern Unification for the Lambda Calculus with Linear and Affine Types</a>," LFMTP 2010.
<br />[<a name="5x2"></a>5] Gúrard Huet, "<a href="http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=44121">Function Pearl: The Zipper</a>," JFP 1997.
<br />[<a name="6x2"></a>6] Wikipedia: <a href="http://en.wikipedia.org/wiki/Zipper_(data_structure)">Zipper (data structure)</a>.
<br /></small>
<br />Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com1tag:blogger.com,1999:blog-827419514217130218.post-47092895045822595672011-08-12T15:09:00.000-04:002011-08-16T12:15:17.927-04:00Holey Data, Part 1/3: (Almost) Difference ListsThis series has been brewing in my head since a year ago, starting at the <a href="http://osl.iu.edu/wgp2010/">2010 Workshop on Generic Programming</a> in Baltimore. It was at least a quarter-baked idea by the time I asked, on the CSTheory Stackexchange, a question that amounted to "are there functional programming difference lists that act like logic programming difference lists"? [<a href="#1">1</a>]
<br />
<br /><h2>Difference lists</h2>Difference lists, which are deep lore of the logic programming community, are essentially data structures that allow you to append to the beginning and the end of the list as a constant time operation; they can also be turned back into "regular" lists as a constant time operation. You can see [<a href="#2">2</a>,<a href="#3">3</a>,<a href="#4">4</a>] for details, but I'd almost encourage you not to: logic programming difference lists are a bit crazy.
<br />
<br />Let me give you two ways of thinking about difference lists. If you're a lazy person (in the Haskell sense), you may say that difference lists really have something to do with laziness in functional programming languages. William Lovas showed me some crazy Haskell code yesterday that does something of this flavor, and there's a paper that espouses this viewpoint [<a href="#5">5</a>]. I'll stick with a different viewpoint: difference lists are <i>functions from lists to lists</i>. This is how difference lists are implemented in the Haskell Data.DList library [<a href="#6">6</a>], and you can find people that back-ported the idea that difference lists are "really" functions to higher-order logic programming languages [<a href="#7">7</a>].
<br />
<br />The problem with the idea of difference lists being functions is that you lose one of the fundamental properties that you started with: while it depends on how you implement things, difference lists implemented as functions will usually have a O(1) append operation but a O(n) "turn me into a regular list" operation. For a discussions of why, see the <a href="http://cstheory.stackexchange.com/questions/2432/difference-lists-in-functional-programming/2442#2442">answers</a> to my question on CSTheory Stackexchange.
<br />
<br />I've believed for some time that this unfortunate O(n)ness could be overcome, but I wanted a working proof-of concept implementation before I talked about it. It was during my recent visit to France that I realized that I really needed to base the proof-of-concept off of a language that was appropriately persnickety about the difference between what I called "value code" and what I called "computation code" in the previous post. And then I found Levy, and embraced and extended it into Levy#.<sup><a href="#foot1">1</a></sup> So at this point you should <a href="http://requestforlogic.blogspot.com/2011/08/embracing-and-extending-levy-language.html">read the previous post</a>.
<br />
<br /><h2>Representing datatypes</h2>In the last post I talked about declaring and using datatypes in Levy#, using the example of lists of integers:<pre> <span style="color:#009900; font-weight:bold">data Nil: <span style="color:#0000ff">list</span><br/> | Cons: <span style="color:#0000ff">int -o list -o list</span></pre>It's necessary to say a few words about how these things are implemented. Every value in Levy# is represented in the OCaml interpreter as a <tt>heapdata ref</tt> - that is, a mutable pointer into a "box" in memory. Some boxes hold integers, other boxes hold the code and environments that represent suspended (<tt><span style="color:#0000ff; font-weight:bold">thunk</span></tt>ed) computations. Declared constants from datatype declarations are another kind of box which holds both a tag (like <tt><span style="color:#0000ff; font-weight:bold">Nil</tt></tt> or <tt><span style="color:#0000ff; font-weight:bold">Cons</tt></tt>) and an appropriate number of values (that is, pointers) for the given datatype.
<br />
<br />As an example, here's the way the list <tt><span style="color:#0000ff; font-weight:bold">Cons 3 (Cons 7 Nil)</span></tt>, stored in the variable <tt><span style="color:#0000ff; font-weight:bold">xs</span></tt>, could be laid out in memory:
<br /><img src="http://typesafety.net/rfl/rjs-fig1.png" />
<br />
<br /><h2>Representing difference lists</h2><h3>Types</h3>On the level of <i>types</i>, a difference list will be represented as a member of the value type (the Linear LF type) <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt>. This adequately represents what a difference list is better than the <tt>list -> list</tt> type of Haskell or Lambda Prolog/Twelf. In Haskell, <i>lots</i> of functions have type <tt>list -> list</tt>, including the list reversal function. Lambda Prolog/Twelf does better; the LF type <tt>list -> list</tt> really has to be <i>substitution function</i> - it can only take a list and plug it into designated places in another list. But <tt>λx. Nil</tt> is a perfectly good LF value of type <tt>list -> list</tt>. In other words, it's not necessarily the case that applying something to a function is equivalent to tacking that thing on to the end of the difference list - the applied argument can simply be ignored.
<br />
<br />Linear functions must use their argument exactly once, so linear functions <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt> accurately represent the idea of difference lists, a list with <i>exactly one missing list somewhere in it</i>. For lists this is kind of a dumb way to put it: the missing list has to go at the end of a series of <tt><span style="color:#0000ff; font-weight:bold">Cons</span></tt>es or nowhere at all, so the linearity forces there to be one (as opposed to zero) occurrences of the hole. If we were talking about terms of type <tt><span style="color:#0000ff; font-weight:bold">tree -o tree</span></tt>, then linearity would enforce that there was one (as opposed to zero or two or three...) occurrences of the hole. We'll come back to that later...
<br />
<br /><h3>Terms</h3>Recall from the previous post that Levy# starts off syntactically precluded from forming values of this type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt>; we'll have to fix that by adding new syntax. On the level of <i>syntax</i>, a difference list will be represented using a Twelf-like notation where "λ x:τ.v" is written as <tt><span style="color:#0000ff; font-weight:bold">[x:tau] v</span></tt>.
<br />
<br /><b>Example</b> The difference list containing first 9 and then 4 will be represented as <tt><span style="color:#0000ff; font-weight:bold">[hole: list] Cons 9 (Cons 4 hole)</span></tt>; let's call that difference list <tt><span style="color:#0000ff; font-weight:bold">dxs</span></tt>. Let <tt><span style="color:#0000ff; font-weight:bold">dys</span></tt> be the some other piece of value code with type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt>, and let <tt><span style="color:#0000ff; font-weight:bold">xs</span></tt> be the example difference list <tt><span style="color:#0000ff; font-weight:bold">Cons 3 (Cons 7 Nil)</span></tt> from the previous section.
<br />
<br />Then <tt><span style="color:#0000ff; font-weight:bold">dxs xs</span></tt> appends the difference list and regular list by filling in the hole with <tt><span style="color:#0000ff; font-weight:bold">xs</span></tt>; the result is <tt><span style="color:#0000ff; font-weight:bold">Cons 9 (Cons 4 (Cons 3 (Cons 7 Nil)))</span></tt>. Similarly, <tt><span style="color:#0000ff; font-weight:bold">[hole: list] dxs (dys hole)</span></tt> is a piece of value code that forms a new difference list by appending the two smaller difference lists.
<br />
<br />In terms of the expressive power of these difference lists, that's pretty much the story: the file <a href="https://github.com/robsimmons/levy/blob/holey-blog-1/linear.levy">linear.levy</a> has some more examples, which can be run in the "holey-blog-1" tagged point in the repository.
<br />
<br /><b>A Linear Logic Aside</b>. This, perhaps, clears up the reason I used linear implication to define constructors like <tt><span style="color:#0000ff; font-weight:bold">Cons</span></tt>. Say instead of <tt><span style="color:#0000ff; font-weight:bold">Cons</span></tt> we had <tt><span style="color:#0000ff; font-weight:bold">ConsBad</span></tt>, a Linear LF constant of type <tt><span style="color:#0000ff; font-weight:bold">int -> list -> list</span></tt>. That type is equivalent in Linear LF to a term of type <tt><span style="color:#0000ff; font-weight:bold">!int -o !list -o list</span></tt>. The exclamation points ("bangs") prevent the occurrence of a linear variable in either argument to <tt><span style="color:#0000ff; font-weight:bold">ConsBad</span></tt>, so <tt><span style="color:#0000ff; font-weight:bold">[hole: list] ConsBad 9 (ConsBad 4 hole)</span></tt> would <i>NOT</i> be a (piece of value code)/(Linear LF term) of type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt>.
<br />
<br /><h3>Memory representation</h3>So far, I've explained how difference lists can be represented as linear functions, but what I originally promised was difference lists with O(1) append and turn-into-a-regular-list operations. To do this, I need to explain how linear functions can be represented in memory. A linear function is represented as another kind of box in memory, one with two pointers. The first pointer acts like a regular value - it points to the beginning of a data structure. The second pointer acts like a pointer into a data structure: it locates the "hole" - the location of the linear variable. This is the in-memory representation of our example <tt><span style="color:#0000ff; font-weight:bold">[hole: list] Cons 9 (Cons 4 hole)</span></tt> from above, stored as the variable <tt><span style="color:#0000ff; font-weight:bold">dxs</span></tt>.
<br /><img src="http://typesafety.net/rfl/rjs-fig2.png" />
<br />If we want to apply this linear function to a value and thereby insert a list into the hole, we merely have to follow the second hole-pointer and write to it, and then return the first value-pointer as the result. Voilà, application with only local pointer manipulation! If the result of applying <tt><span style="color:#0000ff; font-weight:bold">xs</span></tt> to <tt><span style="color:#0000ff; font-weight:bold">dxs</span></tt> was stored as the variable <tt><span style="color:#0000ff; font-weight:bold">ys</span></tt>, the state of memory would look something like this:
<br /><img src="http://typesafety.net/rfl/rjs-fig3.png" />
<br />The catch is that the linear function box that <tt><span style="color:#0000ff; font-weight:bold">dxs</span></tt> refers to no longer can be used in the same way: if we tried to write something different to the hole pointer, disaster could result: it would change the value of <tt><span style="color:#0000ff; font-weight:bold">ys</span></tt>!
<br />
<br />Therefore, the difference list, and all linear function values, are <i>use-only-once data structures</i>, and the current implementation invalidates used linear function values (shown by the red X in the figure above) and will raise a runtime error if they are used twice. This could be precluded with a type system that ensured that values whose type was a linear function are used, computationally, in an <i>affine</i> way - that is, used at most once. Work on affine type systems is interesting, it's important, I've <a href="http://requestforlogic.blogspot.com/2011/03/request-for-typestate-part-2-linear.html">written about it before</a>, but I see it as an orthogonal issue. That's all I really have to say about that.
<br />
<br />And that's how we implement linear<sup><a href="#foot2">2</a></sup> representational functions with constant time <i>composition</i> and <i>application</i>, which corresponds to implementing different lists with O(1) <i>append</i> and </i>turn-into-a-regular-list operations</i>.
<br />
<br /><small><i>(Disclaimer: this section is a bit of an idealization of what's actually going on in the OCaml implementation, but it's honest enough in the sense that everything's really implemented by local pointer manipulation, and represents what I think you'd expect an complied implementation of this code to do.)</i></small>
<br />
<br /><h3>Expressiveness and efficiency</h3>One cost of our representation strategy is that we can only use values whose type is a linear function in an affine way. Another cost is, because the implementation really thinks in terms of "<i>the</i> hole," we can't have two-hole structures like <tt><span style="color:#0000ff; font-weight:bold">tree -o tree -o tree</span></tt>. <i>(Edit: Jason, in the comments, points out that I may be wrong about more-than-one-hole-structures being problematic.)</i> But, if we were interested primarily in <i>expressiveness</i> and were willing to sacrifice constant-time composition and application guarantees, then it would absolutely be possible to have reusable difference lists and linear types representing <i>n</i>-holed contexts for any <i>n</i>.
<br />
<br /><h2>Case Study: The Dutch National Flag Problem</h2>William Lovas proposed this example, and I think it's great: the Dutch National Flag problem was proposed by Edsger Dijkstra as a generalization of what happens in a Quicksort partition step. The logic/functional programming variant of this problem is as follows: you have a list of red, white, and blue tagged numbers. You want to stably sort these to put red first, white second, and blue third: in other words, if the ML function <tt>red</tt> filters out the sublist of red-tagged numbers and so on, you want to return <tt>red list @ white list @ blue list</tt>.
<br />
<br />That's easy: the catch is that you can't use append, because you're only allowed to iterate over the initial unpartitioned list, and you're only allowed to iterate over that list one time.
<br />
<br />The solution in Levy# extended with difference lists is to pass along three difference lists representing the partitioned <i>beginning</i> of the full list and a regular list representing the un-partitioned <i>end</i> of the full list. At each step, you pick an item off the un-partitioned list and (constant-time) append it on to the end of the appropriate difference list. When there's nothing left in the un-partitioned list, you (constant-time) append the three difference lists while (constant-time) turning the concatenated difference lists into a normal list.<pre><span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">dutch'</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">rec <span style="color:#0000ff">loop</span> : <br/> <span style="color:#0000ff">(list -o list)</span><br/> -> <span style="color:#0000ff">(list -o list)</span><br/> -> <span style="color:#0000ff">(list -o list)</span><br/> -> <span style="color:#0000ff">list</span><br/> -> F <span style="color:#0000ff">list</span> is <br/> fun <span style="color:#0000ff">reds</span> : <span style="color:#0000ff">(list -o list)</span> -> <br/> fun <span style="color:#0000ff">whites</span> : <span style="color:#0000ff">(list -o list)</span> -> <br/> fun <span style="color:#0000ff">blues</span> : <span style="color:#0000ff">(list -o list)</span> -> <br/> fun <span style="color:#0000ff">xs</span> : <span style="color:#0000ff">list</span> -> <br/> match <span style="color:#0000ff">xs</span> with<br/> | <span style="color:#009900">Nil</span> -> return <span style="color:#0000ff">(reds (whites (blues Nil)))</span><br/> | <span style="color:#009900">Cons</span> <span style="color:#0000ff">x</span> <span style="color:#0000ff">xs</span> -> <br/> (match <span style="color:#0000ff">x</span> with <br/> | <span style="color:#009900">Red</span> <span style="color:#0000ff">i</span> -> <br/> force <span style="color:#0000ff">loop</span> <br/> <span style="color:#0000ff">([hole: list] reds (Cons x hole))</span><br/> <span style="color:#0000ff">whites</span><br/> <span style="color:#0000ff">blues</span><br/> <span style="color:#0000ff">xs</span><br/> | <span style="color:#009900">White</span> <span style="color:#0000ff">i</span> -> <br/> force <span style="color:#0000ff">loop</span> <br/> <span style="color:#0000ff">reds</span> <br/> <span style="color:#0000ff">([hole: list] whites (Cons x hole))</span><br/> <span style="color:#0000ff">blues</span><br/> <span style="color:#0000ff">xs</span><br/> | <span style="color:#009900">Blue</span> <span style="color:#0000ff">i</span> -> <br/> force <span style="color:#0000ff">loop</span><br/> <span style="color:#0000ff">reds</span><br/> <span style="color:#0000ff">whites</span><br/> <span style="color:#0000ff">([hole: list] blues (Cons x hole))</span><br/> <span style="color:#0000ff">xs</span>)</span></span> ;;</span></pre>This solution is in the repository in the file <a href="https://github.com/robsimmons/levy/blob/holey-blog-1/dutch.levy#L31-62">dutch.levy</a>.
<br />
<br /><h2>Conclusion</h2>I think that the incorporation of linear representational functions as difference lists is beautiful. I've written plenty of Standard ML where I've done tail-recursive manipulations on lists and so either 1) had to keep track of whether I'd called <tt>List.rev</tt> the right number of times or 2) worried whether all these append operations would end up hurting efficiency. This solution gives you the append functions that you need in a way that makes it easy to think about both the order of items and the runtime cost. And it's all enabled by some very pretty types!
<br />
<br />The real straw that broke my back and led to this line of thinking, by the way, was an issue of expressiveness, not efficiency: I was trying to reason about such code that did tail-recursive operations on lists in Agda. That's another story, but it's one that's connected to my thesis in an important way, so I will try to get around to writing it.
<br />
<br />In the next installment of the Holey Data series, I'll show how we can pattern match against difference lists in Levy#, which turns linear representational functions from (almost) difference lists into something significantly awesomer than difference lists.
<br />
<br />(P.S. Thanks Paul a.k.a. Goob for sneaking me into the English Department's cluster to make the illustrations for this blog post.)
<br />
<br /><hr><small>
<br /><sup><a name="foot1"></a>1</sup> Originally I wanted to base it off of a Noam/Dan/Bob style language based on focusing in polarized logic, but it turns out that focused logic isn't a very natural language to actually program in. I owe a debt to Andrej and Matija for going through the legwork of making CBPV into an implemented toy language that it was possible to play around with.
<br /><sup><a name="foot2"></a>2</sup> Actually, this representation strategy <i>might</i> work fine not just for linear functions but for regular-old LF substitution functions with zero, one, two... arguments. When I tried to work this out with William Lovas, we tentatively concluded that you might need some sort of union-find structure to make sure that all the holes were effectively pointers to "the same hole." In this note, I only motivated linear functions in terms of adequately representing difference lists; the real beauty of using difference lists starts coming into play with the next two posts in this series.
<br /><a name="1"></a>[1] Robert J. Simmons, "<a href="http://cstheory.stackexchange.com/questions/2432/difference-lists-in-functional-programming">Difference Lists in Functional Programming</a>", CSTheory StackExchange.
<br /><a name="2"></a>[2] Wikibook on <a href="http://en.wikibooks.org/wiki/Prolog/Difference_Lists">Prolog Difference Lists</a>.
<br /><a name="3"></a>[3] Dale Miller's implementation of (classic, logic-programmey) difference lists: "<a href="http://www.lix.polytechnique.fr/Labo/Dale.Miller/lProlog/cse360/lectures/lect5.html#dlist">Difference Lists provide access to the tail of a list</a>".
<br /><a name="4"></a>[4] Frank Pfenning's course notes: "<a href="http://www.cs.cmu.edu/~fp/courses/lp/lectures/11-diff.pdf">Difference Lists</a>".
<br /><a name="5"></a>[5] Sandro Etalle and Jon Mountjoy. "<a href="http://doc.utwente.nl/56170/1/etalle00lazy.pdf">The <i>Lazy</i> Functional Side of Logic Programming</a>".
<br /><a name="6"></a>[6] Don Stewart's <a href="http://hackage.haskell.org/packages/archive/dlist/0.5/doc/html/Data-DList.html">Data.DList</a> library.
<br /><a name="7"></a>[7] Yves Bekkers and Paul Tarau. "<a href="http://logic.csci.unt.edu/tarau/research/PapersHTML/monadic.html#DIFFLISTS">The Monads of Difference Lists</a>".</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com8tag:blogger.com,1999:blog-827419514217130218.post-14565872532239403092011-08-11T15:38:00.001-04:002011-08-16T08:57:42.831-04:00Embracing and extending the Levy languageI started writing the post about the thing I really wanted to write about, but then I realized that the background was already a long post on its own. So, that's this post! It's about two things:<ol><li>Levy, an implementation of <a href="http://www.cs.bham.ac.uk/~pbl/cbpv.html">call-by-push-value</a> that Andrej Bauer and Matija Pretnar wrote in OCaml for the <a href="http://andrej.com/plzoo/">PL Zoo</a>. I made a <a href="https://github.com/robsimmons/levy/tree/modprec">fork of Levy</a> that, among other small changes, reduces the need for parentheses by making operator precedence more like it is in OCaml.</li><li>A more divergent fork of Levy that adds datatypes in a familiar-but-as-yet-unmotivated way. This diverges from the goals of Levy - as a PL Zoo project, Levy aims to be a clean and uncluttered language implementation, and that's not the case for this more divergent fork. I'll call this new version of Levy... let's see... Levy#. Sure, that works.</li></ol>Other things have been written about Levy at <a href="http://math.andrej.com/2008/11/23/a-toy-call-by-push-value-language/">Andrej's blog</a> and <a href="http://lambda-the-ultimate.org/node/4314">LtU</a>.
<br />
<br /><h2>Basic concepts</h2>The big idea of call-by-push-value, and, in turn, Levy, is that it is a programming language that is incredibly picky about the difference between code that builds data (I'll call this <i>value code</i> as shorthand) and code that does stuff (I'll call this <i>computation code</i>). Value code tends to be really boring: <tt><span style="color:#0000ff; font-weight:bold">3</span></tt> builds an integer value and <tt><span style="color:#0000ff; font-weight:bold">true</span></tt> builds a boolean value. Levy also builds in a couple of total binary operations on integers, so <tt><span style="color:#0000ff; font-weight:bold">3 + x</span></tt> and <tt><span style="color:#0000ff; font-weight:bold">y < 12</span></tt> are both viewed by Levy as value code. Variables are always value code.
<br />
<br />Computation code is more interesting. If-statements are computation code: <tt><span style="color:#cc4020; font-weight:bold">if <span style="color:#0000ff; font-weight:bold">v</span> then c1 else c2</span></tt> examines the value built by <tt><span style="color:#0000ff; font-weight:bold">v</span></tt> and then either does the stuff described by <tt><span style="color:#cc4020; font-weight:bold">c1</span></tt> or else does the stuff described by <tt><span style="color:#cc4020; font-weight:bold">c2</span></tt>. Functions are computation code: <tt><span style="color:#cc4020; font-weight:bold">fun <span style="color:#0000ff; font-weight:bold">x</span>: <span style="color:#0000ff; font-weight:bold">ty</span> -> c</span></tt> receives a value and then does the stuff described by <tt><span style="color:#cc4020; font-weight:bold">c</span></tt>, possibly using the value it received.
<br />
<br />The computation code <tt><span style="color:#cc4020; font-weight:bold">return <span style="color:#0000ff; font-weight:bold">v</span></span></tt> does a very simple thing: it takes the value built by <tt><span style="color:#0000ff; font-weight:bold">v</span></tt> and says "look, I made you a value." If the value code <tt><span style="color:#0000ff; font-weight:bold">v</span></tt> had value type <tt><span style="color:#0000ff; font-weight:bold">ty</span></tt>, then the computation code <tt><span style="color:#cc4020; font-weight:bold">return <span style="color:#0000ff; font-weight:bold">v</span></span></tt> has computation type <tt><span style="color:#cc4020; font-weight:bold">F <span style="color:#0000ff; font-weight:bold">ty</span></span></tt>. So yeah: I hadn't talked about types yet, but pieces of value code are given value types and pieces of computation code are given computation types.<sup><a href="#foot1x0">1</a></1></sup>
<br />
<br />Which brings us to an interesting kind of value code that I didn't mention before: <tt><span style="color:#0000ff; font-weight:bold">thunk <span style="color:#cc4020; font-weight:bold">c</span></span></tt> wraps up (and suspends) the computation code <tt><span style="color:#cc4020; font-weight:bold">c</span></tt> in a value; if the computation code <tt><span style="color:#cc4020; font-weight:bold">c</span></tt> had computation type <tt><span style="color:#cc4020; font-weight:bold">ty</span></tt>, then the value code <tt><span style="color:#0000ff; font-weight:bold">thunk <span style="color:#cc4020; font-weight:bold">c</span></span></tt> has value type <tt><span style="color:#0000ff; font-weight:bold">U <span style="color:#cc4020; font-weight:bold">ty</span></span></tt>.
<br />
<br />The typical type of a function that would be written as <tt>int -> int</tt> in ML is <tt><span style="color:#0000ff; font-weight:bold">U <span style="color:#cc4020; font-weight:bold">(<span style="color:#0000ff; font-weight:bold">int</span> -> F <span style="color:#0000ff; font-weight:bold">int</span>)</span></span></tt>. It has to be a value type (ML <a href="http://existentialtype.wordpress.com/2011/04/02/functions-are-values/">functions are values</a>), hence the "<tt><span style="color:#0000ff; font-weight:bold">U</span></tt>", and the computation code in the function wants to take a value, do some stuff, and eventually return an integer. Here's an implementation of the absolute value function:<pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">abs</span> = <span style="color:#0000ff">thunk <span style="color:#cc4020">fun <span style="color:#0000ff">n</span>: <span style="color:#0000ff">int</span> -> <br/> if <span style="color:#0000ff">n < 0</span><br/> then return <span style="color:#0000ff">0 - n</span><br/> else return <span style="color:#0000ff">n</span></span></span> ;;</span></pre>In order to use the absolute value function, we have to "un-thunk" the computation code inside of the value: the computation code <tt><span style="color:#cc4020; font-weight:bold">force <span style="color:#0000ff">v</span></span></tt> does this. Where we'd write <tt>(abs 4)</tt> in ML, we write <tt><span style="color:#cc4020; font-weight:bold">force <span style="color:#0000ff">abs</span> <span style="color:#0000ff">4</span></span></tt> in Levy (<tt><span style="color:#cc4020; font-weight:bold">force</span></tt> binds more tightly than anything, including application).
<br />
<br /><h2>Evaluation order</h2>A consequence of being persnickety about the difference between value code and computation code is that Levy is very explicit about sequencing. The SML/Haskell/OCaml code "<tt>foo(abs 4,abs 5)</tt>" means "call <tt>foo</tt> with the absolute values of 4 and 5</tt>." One occasionally hard-learned lesson for people who have programmed in both SML and OCaml is that, if <tt>abs</tt> contains effects, this code might do different things: SML has a specified left-to-right evaluation order, but OCaml's evaluation order is unspecified, and in practice it's right-to-left.
<br />
<br />Such underspecification is impossible in Levy, since calling a function is a computation, and computations can't be included directly in values! Instead, if I have computation code <tt><span style="color:#cc4020; font-weight:bold">c</span></tt> of type <tt><span style="color:#cc4020; font-weight:bold">F <span style="color:#0000ff; font-weight:bold">ty</span></span></tt>, we expect it to do some stuff and then say "look, I made you a value (of type <tt><span style="color:#0000ff; font-weight:bold">ty</span></tt>)." The computation code <tt><span style="color:#cc4020; font-weight:bold">c1 to <span style="color:#0000ff; font-weight:bold">x</span> in c2</span></tt> does the stuff described by <tt><span style="color:#cc4020; font-weight:bold">c1</span></tt>, and when that stuff involves saying "look, I made you a value", that value is bound to <tt><span style="color:#0000ff; font-weight:bold">x</span></tt> and used to do the stuff described by <tt><span style="color:#cc4020; font-weight:bold">c2</span></tt>. Therefore, when translate our SML/Haskell/OCaml code above to Levy, we have to explicitly write either<pre> <span style="color:#cc4020; font-weight:bold">force <span style="color:#0000ff">abs</span> <span style="color:#0000ff">4</span> to <span style="color:#0000ff">x1</span> in<br/> force <span style="color:#0000ff">abs</span> <span style="color:#0000ff">5</span> to <span style="color:#0000ff">x2</span> in<br/> force <span style="color:#0000ff">foo</span> <span style="color:#0000ff">x1</span> <span style="color:#0000ff">x2</span></span></pre>or<pre> <span style="color:#cc4020; font-weight:bold">force <span style="color:#0000ff">abs</span> <span style="color:#0000ff">5</span> to <span style="color:#0000ff">x2</span> in<br/> force <span style="color:#0000ff">abs</span> <span style="color:#0000ff">4</span> to <span style="color:#0000ff">x1</span> in<br/> force <span style="color:#0000ff">foo</span> <span style="color:#0000ff">x1</span> <span style="color:#0000ff">x2</span></span></pre>That concludes the introduction of Levy, (my slightly de-parenthesized fork of) the language implemented by Bauer and Pretnar. There are two example files <a href="https://github.com/robsimmons/levy/blob/modprec/example.levy">here</a> (discusses basics) and <a href="https://github.com/robsimmons/levy/blob/modprec/functions.levy">here</a> (discusses functions).
<br />
<br /><h2>Matching</h2>The first new feature in Levy# is the addition of a new kind of computation code, <tt><span style="color:#cc4020; font-weight:bold">match <span style="color:#0000ff">v</span> with | <span style="color:#009900">pat1</span> -> c1 ...</span></tt> that is a generalization of both let-expressions <tt><span style="color:#cc4020; font-weight:bold">let <span style="color:#0000ff">x</span> be <span style="color:#0000ff">v</span> in c</span></tt> and if-expressions <tt><span style="color:#cc4020; font-weight:bold">if <span style="color:#0000ff; font-weight:bold">v</span> then c1 else c2</span></tt> from Levy. Patterns like <tt><span style="color:#009900; font-weight:bold">pat1</span></tt> are a refinement of value code: constants are patterns, variables are patterns (variables in patterns are binding occurrences, like in most functional languages), but addition is not a pattern, and variables can only appear once in a pattern.
<br />
<br />The simplest new thing that we can do with this language construct is write computation code that mimics a <tt>switch</tt> statement in C:<pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">f</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">fun <span style="color:#0000ff">x</span>: <span style="color:#0000ff">int</span> -><br/> match x with<br/> | <span style="color:#009900">1</span> -> return <span style="color:#0000ff">4</span><br/> | <span style="color:#009900">2</span> -> return <span style="color:#0000ff">3</span><br/> | <span style="color:#009900">3</span> -> return <span style="color:#0000ff">2</span><br/> | <span style="color:#009900">4</span> -> return <span style="color:#0000ff">1</span><br/> | <span style="color:#009900">5</span> -> return <span style="color:#0000ff">0</span><br/> | <span style="color:#0000ff">y</span> -> return <span style="color:#0000ff">(x + y)</span></span> ;;</span></pre>This function is included as an example in the "datatype" fork of Levy# on Github <a href="https://github.com/robsimmons/levy/blob/datatype/switch.levy#L24-31">here</a>; let's look at what it actually does in the interpreter:<pre> <span style="color:#000000; font-weight:bold">bash-3.2$ ./levy.byte switch.levy <br/> ... snip ... <br/> Levy. Press Ctrl-D to exit.<br/> Levy> <span style="color:#cc4020">force <span style="color:#0000ff">f</span> <span style="color:#0000ff">1</span></span> ;;<br/> comp <span style="color:#cc4020">F <span style="color:#0000ff">int</span></span> = <span style="color:#cc4020">return <span style="color:#0000ff">4</span></span><br/> Levy> <span style="color:#cc4020">force <span style="color:#0000ff">f</span> <span style="color:#0000ff">3</span></span> ;;<br/> comp <span style="color:#cc4020">F <span style="color:#0000ff">int</span></span> = <span style="color:#cc4020">return <span style="color:#0000ff">2</span></span><br/> Levy> <span style="color:#cc4020">force <span style="color:#0000ff">f</span> <span style="color:#0000ff">5</span></span> ;;<br/> comp <span style="color:#cc4020">F <span style="color:#0000ff">int</span></span> = <span style="color:#cc4020">return <span style="color:#0000ff">0</span></span><br/> Levy> <span style="color:#cc4020">force <span style="color:#0000ff">f</span> <span style="color:#0000ff">7</span></span> ;;<br/> comp <span style="color:#cc4020">F <span style="color:#0000ff">int</span></span> = <span style="color:#cc4020">return <span style="color:#0000ff">14</span></span></span></pre>That's what I expected to happen; hopefully it's what you expected to happen too. Let's declare victory and move on.
<br />
<br /><h2>Using datatypes</h2>The main innovation of Levy# is the introduction of an inductive datatype mechanism, which is basically a bedrock component of every functional programming language (that isn't a Lisp) ever. This should have been a relatively small change, but parser generators are <i>the worst</i> and I'm bad at them so I basically had to rewrite the parser to have a new phase. Sigh.
<br />
<br />I'll start with showing how datatypes are <i>used</i>, because that's a little more standard than the way I declare them. It <i>would</i> look exactly like the uses of ML datatypes, except that I curry them in a Haskell-ey fashion. The function <tt><span style="color:#0000ff; font-weight:bold">sumlist</span></tt>, which sums up a list of integers, looks like this:<pre> <span style="color:#000000; font-weight:bold">val <span style="color:#0000ff">sumlist</span> = <span style="color:#0000ff">thunk</span> <span style="color:#cc4020">rec <span style="color:#0000ff">sumlist</span> : <span style="color:#0000ff">list</span> -> F <span style="color:#0000ff">int</span> is <br/> fun <span style="color:#0000ff">xs</span> : <span style="color:#0000ff">list</span> -><br/> match <span style="color:#0000ff">xs</span> with<br/> | <span style="color:#009900">Nil</span> -> return <span style="color:#0000ff">0</span><br/> | <span style="color:#009900">Cons <span style="color:#0000ff">x</span> <span style="color:#0000ff">xs</span></span> -> force <span style="color:#0000ff">sumlist</span> <span style="color:#0000ff">xs</span> to <span style="color:#0000ff">y</span> in return <span style="color:#0000ff">x + y</span></span> ;;</span></pre>The other thing that's new in the above example is the use of Levy's fixedpoint operator, but there's nothing interesting there from the perspective of reading the code. This and other examples are in <a href="https://github.com/robsimmons/levy/blob/datatype/datatype.levy#L36-40">datatype.levy</a>, by the way.
<br />
<br />Side note: I'm lazy but still wanted a realistic execution model *and* coverage checking (redundant/non-exhaustive matches raise warnings), so I limited the current version of Levy# to depth-1 pattern matching:<pre> <span style="color:#000000; font-weight:bold">Levy> <span style="color:#cc4020">match <span style="color:#0000ff">v4</span> with | <span style="color:#009900">Cons <span style="color:#0000ff">x1</span> (Cons <span style="color:#0000ff">x2</span> <span style="color:#0000ff">xs</span>)</span> -> return <span style="color:#0000ff">0</span></span> ;;<br/> Type error: <span style="color:#009900">Cons <span style="color:#0000ff">x2</span> <span style="color:#0000ff">xs</span></span> not allowed in pattern (depth-1 pattern matching only)</span></pre>If I was doing this for real I'd implement real coverage checking and pattern compilation as a code transformation, which would leave the depth-1 pattern matching interpreter unchanged.
<br />
<br />So that's the use of datatypes; one upshot is that my pieces of value code are maybe a bit more interesting than they used to be: I can make big piece value code like <tt><span style="color:#0000ff; font-weight:bold">Cons 4 (Cons 9 (Cons 16 (Cons (-1) Nil)))</span></tt> to create a big piece of data. I think that's kind of interesting, at least.
<br />
<br /><h2>Declaring datatypes</h2>Datatypes are just a way of telling the computer that you want to work with a particular set of inductively defined structures (at least this is what <i>strictly positive datatypes</i>, the only ones Levy# allows, are doing). A point that I've haltingly tried to make in a <a href="http://requestforlogic.blogspot.com/2011/04/tell-me-story-of-logic.html">previous post</a> was that, well, if you're going to write down a particular set of inductively defined terms, a canonical-forms based logical framework like Linear LF is a perfectly good language to write that in.
<br />
<br />So our constants look basically like the declaration of constants in a Linear LF signature, except that we make the constants uppercase, following ML conventions:<pre> <span style="color:#009900; font-weight:bold">data Nil: <span style="color:#0000ff">list</span><br/> | Cons: <span style="color:#0000ff">int -o list -o list</span></pre>It's a little point, but it's kind of a big deal: we can understand all the values in Levy# as <i>canonical forms in a simply-typed linear lambda calculus with integers and some arithmetic operations</i> - kind of like Twelf with the integer <a href="http://twelf.plparty.org/wiki/Constraint_domain">constraint domain</a> extension.<sup><a href="#foot2x0">2</a></sup>
<br />
<br />The reason it's a big deal is that we can now observe that the two kinds of "function types" we have in this language are way different. There's a <i>computational arrow</i> <tt><span style="color:#cc4020; font-weight:bold"><span style="color:#0000ff; font-weight:bold">ty1</span> -> ty2</span></tt> that we inhereted from Levy, and then there's a Linear LF linear implication arrow, a <i>value arrow</i> <tt><span style="color:#0000ff; font-weight:bold">ty1 -o ty2</span></tt> that we use in value code to construct patterns and values. I didn't invent this idea, it's actually a subset of what Licata and Harper propose in "A Universe of Binding and Computation" (ICFP 2009). Dan Licata would call the Linear LF arrow a "representational arrow," which is probably a better name than "value arrow."
<br />
<br />Of course, the values that we actually use are, without exception, at atomic type; if we try to even write down <tt><span style="color:#0000ff; font-weight:bold">Cons 4</span></tt>, which seems like it would have type <tt><span style="color:#0000ff; font-weight:bold">list -o list</span></tt>, we get a type error from Levy#: <tt><span style="color:#000000; font-weight:bold">constructor Cons expects 2 arg(s), given 1</span></tt>. So by requiring that all the Linear LF terms (i.e. value code) be canonical (beta-normal and eta-long), I've effectively excluded any values of type <tt><span style="color:#0000ff; font-weight:bold">ty1 -o ty2</span></tt> from the syntax of the language.
<br />
<br />What would it mean to introduce value code whose type was a linear implication to Levy#? Well, that's the thing I really wanted to write about.
<br />
<br /><hr><small><sup><a name="foot1x0"></a>1</sup> Focusing and polarization-aware readers will notice that this has something to do with focusing. In particular: positive types are value types, negative types are computation types, <tt><span style="color:#cc4020; font-weight:bold">F</span></tt> is "upshift," and <tt><span style="color:#0000ff; font-weight:bold">U</span></tt> is "downshift."
<br /><sup><a name="foot2x0"></a>2</sup> Technical note: Boolean values, from the perspective of the value language, are just two defined constants, as explained <a href="https://github.com/robsimmons/levy/blob/datatype/switch.levy#L58-65">here</a>. That's important: we have Boolean constants that are values, but the elimination form for Booleans is <i>computation code</i> not value code. If we had an "if" statement at the level of values, all sorts of crazy (and interesting) stuff could potentially happen. At least for the moment, I'm not considering that.</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-59371734416763327442011-08-11T08:23:00.000-04:002011-08-11T08:42:14.963-04:00Video Lecture on Constructive Provability LogicI've previously blogged about <a href="http://requestforlogic.blogspot.com/2010/12/principles-of-constructive-provability_08.html">the principles of constructive provability logic</a> and about <a href="http://requestforlogic.blogspot.com/2011/01/two-new-papers.html">our paper on the topic being accepted to IMLA</a>. Since then (in fact, since my last blog post), I've given three versions of a talk on the logic: first at CMU before I left for my internship at Microsoft Research, the second at the workshop on Intuitionistic Modal Logic and Applications in Nancy, France,<sup>1</sup> and the third the next day at LIX in Saclay (south of Paris) France.<sup>2</sup> After tweaking that talk three times, I decided to tweak it once more and put it on the web on my <a href="http://cpl.hyperkind.org/">page of resources about Constructive Provability Logic</a>.
<br />
<br /><iframe width="425" height="349" src="http://www.youtube.com/embed/feuVH8Saq0Q" frameborder="0" allowfullscreen></iframe>
<br />
<br />This was actually quite a bit harder than I'd expected! It didn't seem quite as difficult when I did it for the <a href="http://www.cs.cmu.edu/~rjsimmon/papers/simmons08linlogalg.html">Linear Logical Algorithms</a> presentation a few years ago, but I guess I was forcing myself to be less perfectionist there (requiring one take instead of 4 definitely forces you to be less perfectionist). Especially in the first two videos, the number of repeated takes comes across in my rather affected tone of voice - which is what you get when you over-rehearse something, at least if "you" is a person who (like me) has forgotten anything he ever learned about the "<a href="http://books.google.com/books/about/The_illusion_of_the_first_time_in_acting.html?id=GS0LAAAAIAAJ">illusion of the first time</a>".
<br />
<br />The effort required was such that I don't realistically see myself doing this too often again, but hopefully this will be at least moderately helpful to someone.
<br />
<br /><hr><small><sup>1</sup> While in Nancy, I took a picture out of the window of the dorm where I was staying, and thereby <a href="http://andrewsullivan.thedailybeast.com/2011/08/the-view-from-your-window-contest-winner-61.html">whittled away a few more of my 15 minutes of fame</a>.
<br /><sup>2</sup> Nothing famous happened there, but I did have excellent fondue.</small>Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-82626122164593417722011-05-28T09:55:00.000-04:002011-05-28T23:18:18.299-04:00Party like it's 1995It is humbling but enlightening when you're struggling to figure out how to describe something and you realize someone already nailed the exposition two decades ago. That's how I felt when I stumbled across this invited paper by Dale Miller, "<a href="ftp://ftp.cis.upenn.edu/pub/papers/miller/GulpProde95.html">Observations about using logic as a specification language</a>." Still a good paper!<br /><br />I wonder if the late eighties and early nineties felt as exciting at the time as they look in retrospect for the logic community. I guess the answer is "yes" - I recall <a href="http://existentialtype.wordpress.com/">Bob Harper</a> talking about one informal European workshop in that period where he was presenting LF, Girard was presenting linear logic, and maybe one of the uniform proofs people were presenting uniform proofs too? Maybe someone who's spending the summer in Pittsburgh can prod Bob into blogging about that meeting, it's a good story. (This reminds me, I need to add Bob's blog to the sidebar!).<br /><br /><h3>Also: completeness of focusing via cut and identity</h3><br />Speaking of uniform proofs: I've been fiddling for some time with this view of that the completeness of a focused sequent calculus can be proven more-or-less directly from the standard metatheory of the focused sequent calculus: cut admissibility and identity expansion. (The completeness of focusing, if you recall, is good for many reasons. It gives us logic programming, and it allows us to think in terms of <a href="http://requestforlogic.blogspot.com/2010/09/focusing-and-synthetic-rules.html">synthetic inference rules</a> in our logical frameworks.) Frank first observed this way back when, I cleaned up and Twelfed up his proof a long time ago and wrote up a version of the argument for ordered linear logic in a tech report recently. the main weakness of our strategy is that, in its current version, it doesn't deal with inversion on positive types, which is why I call it "weak focusing" instead of "focusing." <br /><br />A recent MathOverflow question asked (in a slightly convoluted way) about the completeness of focusing with respect to a Prolog-like language. Since Prolog-like languages don't run into the problem with positive types, "our" strategy* works fine, so I used this as an excuse to finally <a href="http://mathoverflow.net/questions/65776/how-establish-conversion-of-cut-free-proof-into-uniform-proof/65854#65854">write out the full argument</a> in a setting that didn't have a lot of other extraneous stuff going on.<br /><br />* I'd like to know if someone came up with this strategy before us!Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-47379220819630773492011-05-07T16:14:00.000-04:002011-05-08T00:17:55.423-04:00Hacky post: binary, where do you live?This is a problem that's come up various times in the past year, and I wanted to write it up quickly. Much more about hacking (Standard ML in particular, though I don't think it's a problem unique to Standard ML), I thought about putting it on my more personal blog, but it seemed out of place there too. <br /><br />Problem description: oftentimes when you have a binary file, you want to distribute it along with some other resource files (think images and whatnot). You expect that some collection of files containing these images will be distributed together, but you want to be able to move them around on your own system or to a friend's system as one atomic block without breaking everything. This means, effectively, that you have to be able to have the program start with any current directory and understand how to change directories to the directory where "it lives". The easiest way of doing this in Standard ML is to change directories <tt>(OS.FileSys.chdir)</tt> to whichever directory the commandline argument that ran the program appeared to live in <tt>(OS.Path.dir (CommandLine.name ()))</tt>.<br /> <br /><h2>So far so good</h2>We can illustrate how this works and fails to work with a one-line test program that we compile in MLton.<pre>structure Foo = struct val () = print ("I'm " ^ CommandLine.name () ^ "\n") end</pre>Let's save this in /tmp, compile it, and run it:<pre>$ cd /tmp<br />$ mlton foo.sml</pre>Okay, now wherever we go, the directory part of <tt>(CommandLine.name ())</tt> is going to be proper for getting from the current directory of the user into the directory where the binary lives:<pre>$ ./foo<br />I'm ./foo<br />$ cd ..<br />$ /tmp/foo<br />I'm /tmp/foo<br />$ cd /tmp/baz<br />$ ../foo<br />I'm /tmp/baz</pre><br /><h2>Symlinks and paths and lying, lying command lines</h2>The problem's gonna come when we want to execute our file from the path or from a symlink: then, the apparent working directory given by <tt>(CommandLine.name ())</tt> will *not* necessarily be the place where the executable itself lives.<pre>$ pwd<br />/tmp/baz<br />$ export PATH=$PATH:/tmp<br />$ foo<br />I'm foo<br />$ cd /tmp<br />$ mkdir baz<br />$ cd baz<br />$ ln -s ../foo foo2<br />$ ./foo2<br />I'm ./foo2</pre>The solution adopted by the group of us that wrote the C0 compiler is to notice that if you call something from a shell script it gets its path expanded. Therefore, if we have a shell script doing nothing but calling our program, the program will get called with its absolute path and therefore the program's attempts to chdir into the directory where its binary (and associated resources) live will succeed.<pre>$ pwd<br />/tmp/baz<br />$ mv ../foo ../foo.exe<br />$ echo '$0.exe $*' > ../foo<br />$ chmod a+x ../foo<br />$ foo<br />I'm /tmp/foo.exe<br /></pre>However, this solution only works for things on the $PATH, not for symlinks:<pre>$ ./foo2<br />./foo2:1: ./foo2.exe: No such file or directory</pre><br /><h2>Wrapping up</h2>Can anyone think of a better way to do this? The shell script is simple, but it seems unnecessary. Perhaps I really should be asking this on StackOverflow, but the Standard ML conversations on StackOverflow are depressing to say the least.<br /><br />P.S. - If you're distributing your program by way of a SML/NJ heap image, <tt>(CommandLine.name ())</tt> will always be <tt>sml</tt>. To defeat this, when you issue the command that reloads the heap image, add the argument <tt>@SMLcmdname=$0</tt>. That is, of course assuming you're reloading the heap image from within a shell script, which is the common way to make an executable using SML/NJ. So, in effect, with SML/NJ this problem is "automatically solved" because you're already redirecting through a shell script.<br /><br />P.P.S. - After I wrote this post, I realized it was probably a language-independent concern. Sure enough, there's StackOverflow posts about this problem from the perspective of <a href="http://stackoverflow.com/questions/758018/path-to-binary-in-c">C</a> and <a href="http://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe">C++</a>. Seems like it's just a hard problem; crud.Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-71470831848380548332011-04-18T00:15:00.000-04:002011-08-11T19:02:12.220-04:00Tell me a story of logicI have a little technical note that I'm working on, in which I tell a story about linear logic that I tried to work out a year ago and finally returned to with Bernardo Toninho and Yuxin Deng last week. But as I tried to write the introduction, the two-year-old voice in my head kept asking "why?" Why do I need to tell this story? What's the point? The resulting thoughts don't really belong in the technical note I wanted to write, but I hope they're just barely coherent enough that they're worth putting here. (Note: there is a similar discussion to this one in Section 3 of my thesis proposal.)
<br />
<br />Big picture: before we can contemplate any discussion of the formal correctness of any system (a logic, type theory, programming language, cryptographic protocol, or self-driving car) - before we can even ask ourself what "correctness'' means - we have to have a way of formally specifying the system itself. Mathematicians do this by defining sets, category theorists by enumerating universal mapping properties. However, if you're writing down theorems to be checked by a computer, there's almost exactly one game in town: systems for mechanized mathematics have a built-in facility for directly giving (co)inductive specifications of systems.
<br />
<br />A tool's interface for describing inductive specifications is, therefore, a matter of some importance. In most systems, the statement of an inductive specification looks something like a datatype declaration in a programming language in ML or Haskell. One system for mechanized mathematics, Twelf, takes a rather different approach. Rather than <i>directly</i> specifying an inductive structure, Twelf users specify a signature of type families and constants in the dependent type theory LF. If you asked someone about this while they were in a hurry to catch a plane, they'd probably respond "oh, that's what adequacy theorems are for" and run away.
<br />
<br /><h3>A word on adequacy</h3>In the <a href="http://dx.doi.org/10.1145/138027.138060">first paper on LF</a>, Harper, Honsell, and Plotkin discuss <i>adequacy theorems</i>. When we write down a bunch of type families and constants in an LF signature, that means that we can form a certain (inductively-defined!) set of terms. To use a standard example, given the LF signature<pre style="color:#060"> nat: type.
<br /> z: nat.
<br /> s: nat -> nat.</pre>we can create well-typed terms of type <tt style="color:#060">nat</tt> that look like <tt style="color:#060">z</tt>, <tt style="color:#060">(s z)</tt>, <tt style="color:#060">(s (s z))</tt>, and <tt style="color:#060">((λx. s x) (s z))</tt>. That fourth example has a β-redex that, when we reduce it, turns it into the third example. We usually say that the above LF specification is adequate because there is a one-one-correspondence between natural numbers and normal terms of type <tt style="color:#060">nat</tt> in this signature.<sup>1</sup> In general, to quote Harper and Licata's 2007 "<a href="http://dx.doi.org/10.1017/S0956796807006430">Mechanizing metatheory in a logical framework</a>",<blockquote><i>An LF representation of a language is adequate iff it is isomorphic to the informal definition of the language.</i></blockquote>I'm currently thinking that this might be the wrong slogan, because the imprecision of "informal" leads to seemingly inevitable misunderstandings. Another slogan that I might consider in its place is that an LF specification is adequate iff it is isomorphic to the <i>inductive specification</i> of the language. It's just really very deeply ingrained in our collective heads that when we're writing down horizontal lines on paper, that <i>means</i> an inductive specification. But perhaps that's not giving LF enough credit! There's no reason to cede that my scratches on a whiteboard have any special spiritual relationship with inductive definitions and not LF signatures (or sets in ZFC or constructions in category theory, for that matter).
<br />
<br />Taking the view that adequacy relates the canonical forms of LF to some inductively defined structure ("on paper" or otherwise) has two advantages that I see. First, it turns adequacy into something that takes place entirely <i>within</i> formal mathematics. In fact, we have a pretty clear understanding of how to write "first-order" or "concrete" LF types that encode inductive definitions (hint: never allow left-nested arrows). But this means that we can prove adequacy theorems <i>in</i> Twelf. My <a href="">Twelf code</a> proving a bijection between de Bruijn lambda calculus terms and HOAS lambda calculus terms is, in this view, part of an adequacy theorem.<sup>2</sup> The second advantage, in my mind, is it clarifies why we never formally prove adequacy with respect to our "informal definitions" (a point that Daniel Lee and I poked a bit of fun in our SIGBOVIK 2007 paper <a href="http://sigbovik.org/2007/papers/manifestadequacy.pdf">Manifest Adequacy</a>). When I've been using Adga for awhile, I think of my on-paper notation as corresponding to inductive definitions (or, occasionally, inductive-recursive definitions, the more powerful variant of inductive definitions available in Agda). When I've been using Twelf for awhile, I think of my on-paper notation as corresponding to LF signatures that use <a href="http://en.wikipedia.org/wiki/Higher-order_abstract_syntax">higher-order abstract synatx</a>. It's only in the event that I want to know that these two ways of formalizing the same system coincide that I would actually think about adequacy in practice. To reempahsize, in this event adequacy is an entirely formal concept.
<br />
<br /><h3>Why logical frameworks?</h3>There's no contrarian impulse that lead to LF being used to encode systems. Harper, Honsell, and Plotkin made the point as well as I could hope to:<blockquote><i>The structure of the LF type system provides for the uniform extension of the basic judgment forms to two higher-order forms introduced by Martin-Löf, the <b>hypothetical</b>, representing consequence, and the <b>schematic</b>, representing generality.</i></blockquote>There are two important points to make about the above quote. The more obvious one is that LF, compared to "raw" inductive definitions, gives a clean notion of hypothetical reasoning and uniform substitution (the basis of what was later termed "higher-order abstract syntax"). The less obvious point is that, in the above quote, Harper, Honsell, and Plotkin are referencing Martin-Löf's "<a href="http://www.cs.cmu.edu/afs/cs/Web/People/crary/819-f09/Martin-Lof83.pdf">On the meaning of logical constants and the justification of the logical laws</a>," which isn't actually about LF. Rather, it's a story about how we understand the meaning of propositions and how we establish that a logic's definition makes sense in the presence of a uniform notion of hypothetical judgments.
<br />
<br />Martin-Löf's discussion, which Pfenning and Davies termed the "judgmental methodology" in their their <a href="http://dx.doi.org/10.1017/S0960129501003322">judgmental reconstruction of modal logic</a>, is important for people who write systems in Twelf because it sets up a particular world-view that maps quite well onto formalization in LF. To put it another way, the judgmental methodology is one important source of intuitions that guide the mental process of mapping between our "on-paper" notation and LF signatures. And so it's no accident that 1) we use Twelf a lot at Carnegie Mellon 2) our student reading group usually reads <i>On the meaning of logical constants and the justification of the logical laws</i> at least once every other year. The judgmental properties underlying LF and Martin Löf's reconstruction are the same, so understanding the judgmental methodology is a good way of understanding LF formalization.
<br />
<br />This continues to be the case in the Linear LF extension to LF, where linear hypothetical assumptions are understood to be <i>resources</i> that will be consumed in the service of building a goal. The underlying story here, which Wikipedia originally attributes to Lafont, is called the <i>resource interpretation</i>, and it allows us to understand how to do things like represent programming languages with state conveniently, because "a bunch of memory" is a concept that can be represented as "a bunch of resources." The story that we tell about linear logic - the resource interpretation - is important in part because it gives us intuitions for understanding LLF formalizations.
<br />
<br />I don't know that I can speak for where my intuitions about inductive definitions come from - as I've said before, computer scientists tend to treat them as pretty foundational. On the other hand, it's difficult to <i>really</i> use the "raw" inductive definitions provided natively by most non-Twelf provers to reason about programming languages, which is why there has been much recent (and, from an outside perspective, seemingly quite successful) work aimed at providing the right tool support for specifying and reasoning about programming languages in highly-developed proof assistants like Coq. The only danger here is, as soon as you start adding layers of interface between yourself and inductive definitions, the question of adequacy (the informal version, at least) again raises its head: there's not necessarily the same Martin-Löfian story guiding you, so where are your intuitions about the correctness of your system coming from? How would you actually prove that it corresponds to an ostensibly equivalent LF signature?
<br />
<br />This, in turn, is one reason why <i>two-level approach</i> have been recently popularized by Felty's <a href="http://hybrid.dsi.unimi.it/">Hybrid</a> system and Gacek's <a href="http://abella.cs.umn.edu/">Abella</a>. In these systems, there's a specially supported logical layer where one can use the same kind of intuitions we use in LF encodings. LF or LLF could, in fact be used for this logical layer, though these two systems use different variations on that theme. The second layer is a separate facility for reasoning about these encodings.
<br />
<br />I'll stop here, though one argument I might be interested in trying to make in the future is that Twelf itself is actually the <i>original</i> two-level approach. There's an even stronger argument to be made in the case of HLF, Jason Reed's implementation of meta-reasoning for LLF on top of Twelf. However
<br />
<br /><sup>1</sup> We usually say <i>canonical terms</i> or <i>canonical forms</i>, not <i>normal terms</i>. The canonical forms are the β-normal η-long terms. However, η-expansion doesn't mean anything for this example.
<br />
<br /><sup>2</sup> It would be a full adequacy theorem if I'd shown that the bijection was compositional - that substitution on the de Bruijn terms means the same thing as substitution that LF gives us for free - but I didn't prove that part. Perhaps I should.Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com0tag:blogger.com,1999:blog-827419514217130218.post-49293306134257677892011-03-08T15:09:00.000-05:002011-03-10T11:37:47.716-05:00Request for Typestate, Part 2: Linear Types, Now!<i>Part 2 in a <a href="http://requestforlogic.blogspot.com/search/label/request%20for%20typestate">series</a>. <a href="http://requestforlogic.blogspot.com/2011/01/request-for-typestate-part-1.html">Part 1</a> began with a discussion hypothetical discussion of a ML-like language with linear types.</i><br /><br />So in my previous post, I used the REPL loop of an "Imaginary Future Linear ML." In the following month, I've learned that there are no less than three different proposals of Imaginary Future Linear ML. Two of them, <a href="http://www.ccs.neu.edu/home/tov/pubs/alms/">Alms</a> and <a href="https://github.com/pikatchu/LinearML">LinearML</a>, are implemented systems that you can presumably download and run onto the device that you're using to read this very blog post!<sup>1</sup> The third, <a href="http://portal.acm.org/citation.cfm?id=1708016.1708027 ">F-Pop</a> by Mazurak et al, is, I believe, less seriously implemented but is formalized in Coq (which is also awesome: three cheers for mechanized metatheory!). <br /><br />I was actually pleased with how much I "guessed correctly" in my discussion. For instance, Alms is a serious implementation of a language with affine, not linear, types: if you recall, affine functions can't use their arguments twice, but they can avoid using them at all. One argument from the Alms designers, which I had not considered, is that requiring a linear type discipline - where all variables get used <i>exactly</i> once along every "control path" - can make it rather difficult to work with handled exceptions. <br /><br />Another thing that I "guessed" was in Take 3, when I suggested that we might want some syntactic sugar to make code look more like "methods" - the <tt>print</tt> method takes a string and a linear reference to a string and returns a new linear reference to that string, so why don't we have syntax that supports writing (<tt>print "Hello World" outstream</tt>) instead of (<tt>let outstream = print "Hello World" outstream</tt>). The F-Pop paper explicitly discusses this kind of notation in Section 5.2. I like that this is something they considered, though I'm still a little unsure how I feel about it "in practice." What it's doing is capturing a common intuition, that <i>an argument, passed by reference, is both an input and an output.</i> But how much of this intuition is fundamental, and how much is it based on path dependence - we happened to start out with languages like C and Java that force functions to have only one "official" output, so are we just used to being forced into using this unnatural design pattern?<br /><br /><sup>1</sup> Caveat 1: well, as long as you're not using iOS. (I just got an iPad. I am not proud of this but yes, it is shiny.) Caveat 2: I couldn't get either Alms or LinearML to run on my laptop.Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com1tag:blogger.com,1999:blog-827419514217130218.post-76439183214681698202011-02-08T15:16:00.000-05:002011-03-10T00:58:59.750-05:00Request for Typestate, Part 1<i>Part 1 of a <a href="http://requestforlogic.blogspot.com/search/label/request%20for%20typestate">series</a>. If you're interested in Imaginary Future ML, a number of proposals exist: see <a href="http://requestforlogic.blogspot.com/2011/03/request-for-typestate-part-2-linear.html">Part 2</a>.</i><br /><br />Recently I have been trying to understand typestate, since a lot of my friends, hallmates, and fellow distance runners have written implementations, papers, Ph.D. theses, and SIGBOVIK presentations on the topic. A conversation with Ron Garcia helped me get the first idea of what is going on and how it relates to linear type systems, a topic I do feel like I understand.<br /><br />Some background, as I understand it. The word "typestate" was introduced by Strom and Yemini way back a long time ago in 1984 in a paper entitled "Typestate: A Programming Language Concept for Enhancing Software Reliability" (Jonathan Aldritch has posted the PDF <a href="http://www.cs.cmu.edu/~aldrich/papers/classic/tse12-typestate.pdf">here</a>). It was an attempt to capture the fact that function calls often have a "do this, then do that, then don't do that anymore" kind of flavor. The most obvious and simple example is one that most every programmer has encountered, thanks to C's ubiquity:<br /><br /><center><img src="http://typesafety.net/rfl/typestate-alloc.png" /></center><br /><br />In other words, we create a pointer by mallocing it, and then we can use that pointer as long as we want to, but if we free it we can't use that pointer anymore. (Well, we can, but we're not supposed to, C sucks, whatever.) Okay, but now you're scratching your head thinking that this is basically why modern humans use garbage collection. Well, sure, but while this solves a particular problem, the problem is really a pattern of problems, and the pattern just re-asserts itself in different ways. For instance, take this interaction with SML/NJ's toplevel:<pre style="color:#060"> $ sml<br /> Standard ML of New Jersey v110.70 [built: Wed Jun 17 16:24:00 2009]<br /> - val FILE = TextIO.openOut("foo.txt");<br /> val FILE = - : TextIO.outstream<br /> - TextIO.output(FILE, "Hello world\n");<br /> val it = () : unit<br /> - TextIO.closeOut(FILE);<br /> val it = () : unit<br /> - TextIO.output(FILE, "I've got a bad feeling about this\n");<br /><br /> uncaught exception Io [Io: output failed on "foo.txt", closed stream]<br /> raised at: Basis/Implementation/IO/text-io-fn.sml:443.14-443.56</pre>What's happened here is that we've run afoul of exactly the same pattern that we know from C.<br /><br /><center><img src="http://typesafety.net/rfl/typestate-file.png" /></center><br /><br />Because we're in a safe language, the results are less dire; a dynamic exception gets thrown rather than memory being silently corrupted. However, the problem remains: it seems like this is the sort of thing we should be able to statically preclude (using a type system), or at <i>least</i> warn programmers about ahead of time (using a static analysis).<br /><br /><h2>Typestate-oriented programming</h2>Both of these approaches (type systems and static analyses) have been investigated thoroughly by my friends in Jonathan Aldrich's group, such as Kevin Bierhoff, Nels Beckman, Ciera Jaspan, Karl Naden, Roger Wolff, and Ron Garcia. The following example is a way of taking that doomed ML snippet above and making it a piece of typestate-checked code; it's mostly copied from <a href="http://www.cs.cmu.edu/~aldrich/papers/onward2009-state.pdf">Typestate Oriented Programming</a> by Jonathan Aldrich, Joshua Sunshine, Darpan Saini, and Zachary Sparks and was presented at ONWARD 2009.<pre style="color:#006"> state File {<br /> public final String filename;<br /> }<br /><br /> state OpenFile extends File {<br /> private CFilePtr filePtr;<br /> public void write(str : String) { ... }<br /> public void close() [OpenFile>>ClosedFile]<br /> { ... }<br /> }<br /><br /> state ClosedFile extends File {<br /> public void open() [ClosedFile>>OpenFile]<br /> }</pre>The idea here is that a <tt>File</tt> is a file that might be open or closed; you can only read from an <tt>OpenFile</tt>, which is a subtype of <tt>File</tt>. That annotation on the <tt>close()</tt> and <tt>open()</tt> methods is telling us that those functions actually <i>change the type</i> of the object they're acting upon; the typesystem has to work accordingly to make sure you only close or read from an <tt>OpenFile</tt>, even if though, after you close it, the object has "become" exactly the thing - a <tt>ClosedFile</tt> - that you were making sure it wasn't before.<br /><br />I have some particular questions about the nature of the thing called "subtyping" in typestate oriented programming that I hope to get to in a separate post, but I hope this gives a decent overview of how typestate intuitively works even if it doesn't give much intuition as to how the typing rules "actually" work. As it turns out, the type theory behind typestate-oriented programming uses ideas from <i>linear logic</i>, which can also be used to directly solve the motivating problem. What I really hope to explore in this series of posts is the way in which typestate - which is mostly being investigated in object oriented settings - can (or can't) be understood as a linear type discipline in a functional programming language.<br /><br /><h2>Linear Types</h2>From a programmer's perspective, linear logic is a functional programming language where all variables only get to be mentioned <i>once</i>. There's an additional requirement, which is that every variable <i>must</i> be used once, but sometimes people ignore this requirement: the system where variables are mentioned no more than once is technically an "affine" system and not a "linear" one, but it's common to call affine systems linear anyway, because the more general term is "substructural" and "substructural" is for some reason a word that people get scared by. (This turns out to be pretty problematic for me, but that's a different blog post.)<br /><br />So, nothing to be scared of, can't use variables twice. Let's see what happens with our interaction with SML:<pre style="color:#600"> $ smlinear<br /> Imaginary Future Linear ML [built: IN THE FUTURE]<br /> - val FILE = TextIO.openOut("foo.txt");<br /> val FILE = - : TextIO.outstream<br /> - TextIO.output(FILE, "Hello world\n");<br /> val it = () : unit<br /> - TextIO.closeOut(FILE);<br /> stdIn:1.17-1.21 Error: unbound variable or constructor: FILE<br /> Variable "FILE" was previously in scope as a linear resource,<br /> but that resource already consumed</pre>Oh dear, that doesn't work at all - we can't close the file or even write to it twice! This is our fault; when we ported the SML Basis Library from SML/NJ to Imaginary Future Linear ML, we needed to make some changes to the way we deal with <tt>TextIO</tt>. Here's (part of) the old signature for <tt>TextIO</tt>, with the arrows </tt>-></tt> turned into lollipops <tt>-o</tt> to emphasize that we're in linear logic now. (Again, nothing scary, it just means that variables can only be mentioned once.)<pre style="color:#006"> signature OLD_TEXT_IO = sig<br /> type outstream<br /> val openOut : string -o outstream<br /> val output : outstream * string -o unit<br /> val close : outstream -o unit<br /> end</pre>This signature had just the right type when we could reuse the variable <tt>FILE</tt> that we got from <tt>openOut</tt>, but now, in Imaginary Future Linear ML, we can't.<br /><br /><h3>Take 1: Restoring the old behavior</h3>It's ridiculous, however, to make the output stream go away forever whenever we write to it! So what do we do? Well, all we need is for the functions that currently return <tt>unit</tt> to give us <i>back</i> a new object of type <tt>outstream</tt>, and then we can use it again!<pre style="color:#006"> signature NEW_TEXT_IO_1 = sig<br /> type outstream<br /> val openOut : string -o outstream<br /> val output : outstream * string -o outstream<br /> val close : outstream -o outstream<br /> end</pre>With this new signature for <tt>TextIO</tt>, we could have exactly the same (failed) interaction with the toplevel that we had before, the difference is that, now, we always bind a new variable when we run a function so that we have an <tt>outstream</tt> object to use in the future.<pre style="color:#600"> $ smlinear<br /> Imaginary Future Linear ML [built: IN THE FUTURE]<br /> - val FILE1 = TextIO.openOut("foo.txt");<br /> val FILE1 = - : TextIO.outstream<br /> - val FILE2 = TextIO.output(FILE1, "Hello world\n");<br /> val FILE2 = - : TextIO.outstream<br /> - val FILE3 = TextIO.closeOut(FILE2);<br /> val FILE3 = - : TextIO.outstream<br /> - TextIO.output(FILE3, "I've got a bad feeling about this\n");<br /><br /> uncaught exception Io [Io: output failed on "foo.txt", closed stream]<br /> raised at: Basis/Implementation/IO/text-io-fn.sml:443.14-443.56</pre>Okay, well, why did I show an example that didn't solve the problem <i>at all</i>? Because it's important to realize that we haven't really lost any expressiveness in the language: even though we have a more restrictive system that prevents us from using variables twice, we can still express most of the things we could before. <br /><br />(Side note: in the case of the strictly-linear variant of Gödel’s System <i>T</i>, it's actually been proven that we lose <a href="http://dx.doi.org/10.1016/j.tcs.2009.11.014">exactly nothing</a> in terms of mathematical expressiveness by using the more restrictive type system, which is neat.)<br /><br /><h3>Take 2: Actually fixing the problem</h3>So, intuitively, we know what to do here: once we've closed an <tt>outstream</tt> we shouldn't be able to re-close it or output to it. The easiest solution is to make the function <tt>close</tt> have the type <tt>outstream -o unit</tt> - that way, closing an <tt>outstream</tt> consumes the variable representing the output stream and leaves us with no way to even mention it.<pre style="color:#006"> signature NEW_TEXT_IO_1 = sig<br /> type outstream<br /> val openOut : string -o outstream<br /> val output : outstream * string -o outstream<br /> val close : outstream -o unit<br /> end</pre>This suffices to turn our original dynamic exception into a static type error like the <tt>Variable...already consumed</tt> error I made up earlier.<br /><br /><h3>Take 3: Matching the typestate example</h3>For the sake of completeness, let's think about how our example looks different from the typestate example. The big difference is that all the functions in the typestate example are <tt>void</tt>: they don't output anything, they just change the type of their argument. But I believe that this is actually something that it would be reasonable to mimic in a linear system like "Imaginary Future Linear ML." The first step is to say: well, when I consumed <tt>FILE</tt> there was no longer a <tt>FILE</tt> hanging around, so why don't I just reuse that variable name?<pre style="color:#600"> $ smlinear<br /> Imaginary Future Linear ML [built: IN THE FUTURE]<br /> - val FILE = TextIO.openOut("foo.txt");<br /> val FILE = - : TextIO.outstream<br /> - val FILE = TextIO.output(FILE, "Hello world\n");<br /> val FILE = - : TextIO.outstream<br /> - val FILE = TextIO.closeOut(FILE);<br /> val FILE = - : TextIO.outstream<br /> - TextIO.output(FILE, "I've got a bad feeling about this\n");<br /> stdIn:1.17-1.21 Error: unbound variable or constructor: FILE<br /> Variable "FILE" was previously in scope as a linear resource,<br /> but that resource already consumed</pre>The second step is to consider that the typestate-oriented examples really do have "outputs" - their outputs are the new states of the <tt>this</tt> object. This would be easy syntactic sugar to whip up, just let <tt>val output : [outOpen >> outOpen] * string -o unit</tt> mean <tt>val output : outOpen * string -o outOpen</tt> and give the following signature<pre style="color:#006"> signature NEW_TEXT_IO_2 = sig<br /> openOut : string -o outOpen<br /> output : [outOpen >> outOpen] * string -o unit<br /> closeOut : [outOpen >> outClosed] -o unit<br /> reopenOut : [outClosed >> outOpen] -o unit<br /> end</pre>With this use of syntactic sugar, we could support the following interaction with our pretend ML:<pre style="color:#600"> $ smlinear<br /> Imaginary Future Linear ML [built: IN THE FUTURE]<br /> - val FILE = TextIO.openOut("foo.txt");<br /> val FILE = - : TextIO.outOpen<br /> - TextIO.output(FILE, "Hello world\n");<br /> val it = () : unit<br /> - TextIO.closeOut(FILE);<br /> val it = () : unit<br /> - TextIO.output(FILE, "I've got a bad feeling about this\n");<br /> stdIn:6.1-6.59 Error: operator and operand don't agree [tycon mismatch]<br /> operator domain: TextIO.openOut * string<br /> operand: TextIO.closeOut * string<br /> in expression:<br /> TextIO.output (FILE,"I've got a bad feeling about this\n")</pre>I think that's pretty nice.<br /><br /><h2>Conclusion</h2>I think the lesson of this first discussion is that, when relating linear functional programming languages to imperative programming languages, it's useful to think of object passed to functions as being "lost" to that function and then automatically "re-bound" with the same name when the function returns. Once we make this step, the notion that the re-bound variable can have a different type - which is one key insight of typestate-oriented programming - seems quite natural.<br /><br />It's important to note that none of the stuff I said about linearity and state is at all novel; for further reading see Wadler's paper <a href="http://homepages.inf.ed.ac.uk/wadler/papers/linear/linear.ps">Linear Types Can Change The World!</a> (warning: .ps) or David Walker's chapter "Substructural Type Systems" in Advanced Topics in Types and Programming Languages. Those two sources just about cover the background reading; if anyone has other suggestions please feel free to put them in the comments.Robhttp://www.blogger.com/profile/05106663398227635415noreply@blogger.com2