Saturday, May 7, 2011

Hacky post: binary, where do you live?

This is a problem that's come up various times in the past year, and I wanted to write it up quickly. Much more about hacking (Standard ML in particular, though I don't think it's a problem unique to Standard ML), I thought about putting it on my more personal blog, but it seemed out of place there too.

Problem description: oftentimes when you have a binary file, you want to distribute it along with some other resource files (think images and whatnot). You expect that some collection of files containing these images will be distributed together, but you want to be able to move them around on your own system or to a friend's system as one atomic block without breaking everything. This means, effectively, that you have to be able to have the program start with any current directory and understand how to change directories to the directory where "it lives". The easiest way of doing this in Standard ML is to change directories (OS.FileSys.chdir) to whichever directory the commandline argument that ran the program appeared to live in (OS.Path.dir (CommandLine.name ())).

So far so good

We can illustrate how this works and fails to work with a one-line test program that we compile in MLton.
structure Foo = struct val () = print ("I'm " ^ CommandLine.name () ^ "\n") end
Let's save this in /tmp, compile it, and run it:
$ cd /tmp
$ mlton foo.sml
Okay, now wherever we go, the directory part of (CommandLine.name ()) is going to be proper for getting from the current directory of the user into the directory where the binary lives:
$ ./foo
I'm ./foo
$ cd ..
$ /tmp/foo
I'm /tmp/foo
$ cd /tmp/baz
$ ../foo
I'm /tmp/baz

Symlinks and paths and lying, lying command lines

The problem's gonna come when we want to execute our file from the path or from a symlink: then, the apparent working directory given by (CommandLine.name ()) will *not* necessarily be the place where the executable itself lives.
$ pwd
/tmp/baz
$ export PATH=$PATH:/tmp
$ foo
I'm foo
$ cd /tmp
$ mkdir baz
$ cd baz
$ ln -s ../foo foo2
$ ./foo2
I'm ./foo2
The solution adopted by the group of us that wrote the C0 compiler is to notice that if you call something from a shell script it gets its path expanded. Therefore, if we have a shell script doing nothing but calling our program, the program will get called with its absolute path and therefore the program's attempts to chdir into the directory where its binary (and associated resources) live will succeed.
$ pwd
/tmp/baz
$ mv ../foo ../foo.exe
$ echo '$0.exe $*' > ../foo
$ chmod a+x ../foo
$ foo
I'm /tmp/foo.exe
However, this solution only works for things on the $PATH, not for symlinks:
$ ./foo2
./foo2:1: ./foo2.exe: No such file or directory

Wrapping up

Can anyone think of a better way to do this? The shell script is simple, but it seems unnecessary. Perhaps I really should be asking this on StackOverflow, but the Standard ML conversations on StackOverflow are depressing to say the least.

P.S. - If you're distributing your program by way of a SML/NJ heap image, (CommandLine.name ()) will always be sml. To defeat this, when you issue the command that reloads the heap image, add the argument @SMLcmdname=$0. That is, of course assuming you're reloading the heap image from within a shell script, which is the common way to make an executable using SML/NJ. So, in effect, with SML/NJ this problem is "automatically solved" because you're already redirecting through a shell script.

P.P.S. - After I wrote this post, I realized it was probably a language-independent concern. Sure enough, there's StackOverflow posts about this problem from the perspective of C and C++. Seems like it's just a hard problem; crud.

No comments:

Post a Comment