About XML, bundles and hard-links

Increasingly, file-formats are not defined on single files, but on set of files usually bound together using an XML digest. One typical example of such file format is the OpenDocument format supported by OpenOffice. While Open-Office uses a Zip archive to bundle together the different files, many programs on Mac OS X use the notion of Bundle. A bundle is technically a directory with a special bit set that tells the file explorer to treat the directory as a single file.

While such files are less compact that a zip archive, the have the advantage of having all the logical entities (files) living directly on the file-system. This means in particular that you can use file-system tools like hard-links. One typical use of this trick I use to fill in forms. Many forms are distributed as PDF files, but without using PDF’s form functionality. They are just a page description of the form. To fill them in clearly, I typically import the PDF as a background image into Omnigraffle, and then add the text for the different fields. I like to keep a copy of the filled-in form, but also the clean form. What I do is I keep the Omnigraffle document and create a hard-link to the PDF file embedded inside the bundle outside it. This way I have two files, a pdf document (the form), and an omnigraffle document (the filled-in form), but the disk space is only used once.

The resulting files look something like this. Notice that the PDF file has a link count of two. In this particular the saving is negligible, but some PDF are quite large.

Perm. Lks Owner Group Flags Size Modification Name
dr-x—— 5 wiesmann wiesmann uchg 170B Nov 16 16:36 Departure Form.graffle
-r——– 2 wiesmann wiesmann uchg 9K Nov 21 13:37 Departure Form.pdf

The content of the bundle is the following:

Perm. Lks Owner Group Flags Size Modification Name
-r——– 1 wiesmann wiesmann 0B Nov 16 16:36 Icon
-r——– 1 wiesmann wiesmann 49K Nov 16 16:36 data.plist
-r——– 2 wiesmann wiesmann 23K Nov 16 16:36 image1.pdf

This trick can be used for all file-formats that rely on bundles, in particular Keynote files.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.