Hard links

The web server of our lab is generated using a template engine called velocity, basically the web site is described by a set of files that are compiled into the final web site. This is convenient because the web site can be tested on another machine, for instance my laptop. While many pages are generated from templates, like for instance the bibtex files, many other, including a large volume of pdf, jpeg, and Quicktime files.

During the build process, those files are simply copied from the source directory into the destination directory. This is ok, but basically means that I’m wasting around 160 megabytes of disk space on my laptop simply to have two copies of some files (sometimes three, when I wrote the paper or created the slides and there is also a copy in my home hierarchy).

The solution to this problem is as simple as old: use hard links. I created a hard-link of each static file in the source directory into the destination directory. As the copy task of ant sees no difference between source and destination, it leaves the destination alone. Basically the files are present twice, once in the source and once in the destination, but they use the disk space only once.

This approach works because the files are basically never touched, so I don’t have to worry about updates. Updates are the main problem of hard-links: if the file is edited inplace, the updates will show-up everywhere but if the editor is smart and first saves the changes into a new file and then does the swap, the file is not really updated but replaced by another with the same name. This is a sane way of doing updates, because it ensures atomicty, but it breaks hard-links. Replacing a file with another one with the same name yields the same result. So in my case, if I replace one the static files, the destination link will not be affected, become outdated and be replaced by the ant copy task. This is OK for me, because the behaviour is safe, both the source and the destination hierarchy will be consistent, simply some disk space will be wasted.

There are other limitations to hard-links, they don’t work between file-systems and you cannot hard-link directories. The first restriction is easy to understand: it is difficult to share storage between different file-systems. The second is more delicate. Technically nothing in the internals of Unix prevents the hard-linking of a directory. The problem is more about avoid cycles in the file-system hiearchy, basically allowing a directory to contain itself. Permitting cycles has the potentiel to create a lot of chaos, as many tools (like the tree command) rely on the fact that the file-system does not contain cycles. Even if a hard-linked directory does not create a cycle at creation time, it might do so later, so permitting hard-links of directories imply adding the possibility that a move of a directory later on might fail (because it would create a cycle).

So how do you create hard-links? Basically you can use the ln command. The GNU version of copy (cp) also supports option --link, which links files instead of copying them. This version of copy is by default on Linux, and can be installed on Mac OS X, where it is usually called gcp.

One thought on “Hard links

  1. True one cannot create hard linked directories using ln but
    I havemanaged to do so by accident while using “zic’ to update my DST settings on a Solaris box.

    The zic -l (name) option creates a “localtime” directory which can in some instances be a hard link to (name ) directory ( as I regretfully found out ) and then is ( so far ) impossible to delete.

    Do you have any suggestions


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: