One of the fascinating things about the demo-scene is the amount of things that can fit in 4K of code. On the other hand, when one compiles a simple program under a Unix system, the resulting binary is usually very large for its functionality. Simple programs like true
use more than 12 Kilobytes. What the hell is the reason? Under Mac OS X, we need to look at the Mach-0 binary file format for an answer.
A handy tool to look into binaries is otool
. What can it tell us? First let us look into the the header: otool -h /usr/bin/true
/usr/bin/true: Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xfeedface 7 3 0x00 2 13 908 0x00000085
A Mach-0 file is a sequence of commands that tells the operating system how to set-up a process for starting. The file has first a header, then a sequence of commands that can reference positions later in the file.
Here we see that setting up the true
requires 13 commands that take up 908 bytes. Let’s see what those commands are.
otool -l /usr/bin/true /usr/bin/true: Load command 0 cmd LC_SEGMENT cmdsize 56 segname __PAGEZERO vmaddr 0x00000000 vmsize 0x00001000 fileoff 0 filesize 0 maxprot 0x00000000 initprot 0x00000000 nsects 0 flags 0x0 Load command 1 cmd LC_SEGMENT cmdsize 124 segname __TEXT vmaddr 0x00001000 vmsize 0x00001000 fileoff 0 filesize 4096 maxprot 0x00000007 initprot 0x00000005 nsects 1 flags 0x0 Section sectname __text segname __TEXT addr 0x00001f98 size 0x00000066 offset 3992 align 2^2 (4) reloff 0 nreloc 0 flags 0x80000400 reserved1 0 reserved2 0 Load command 2 cmd LC_SEGMENT cmdsize 192 segname __DATA vmaddr 0x00002000 vmsize 0x00001000 fileoff 4096 filesize 4096 maxprot 0x00000007 initprot 0x00000003 nsects 2 flags 0x0 Section sectname __data segname __DATA addr 0x00002000 size 0x00000014 offset 4096 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __dyld segname __DATA addr 0x00002014 size 0x0000001c offset 4116 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Load command 3 cmd LC_SEGMENT cmdsize 124 segname __IMPORT vmaddr 0x00003000 vmsize 0x00001000 fileoff 8192 filesize 4096 maxprot 0x00000007 initprot 0x00000007 nsects 1 flags 0x0 Section sectname __jump_table segname __IMPORT addr 0x00003000 size 0x00000005 offset 8192 align 2^6 (64) reloff 0 nreloc 0 flags 0x04000008 reserved1 0 (index into indirect symbol table) reserved2 5 (size of stubs) Load command 4 cmd LC_SEGMENT cmdsize 56 segname __LINKEDIT vmaddr 0x00004000 vmsize 0x00002000 fileoff 12288 filesize 5296 maxprot 0x00000007 initprot 0x00000001 nsects 0 flags 0x0 Load command 5 cmd LC_SYMTAB cmdsize 24 symoff 12288 nsyms 2 stroff 12316 strsize 32 Load command 6 cmd LC_DYSYMTAB cmdsize 80 ilocalsym 0 nlocalsym 0 iextdefsym 0 nextdefsym 1 iundefsym 1 nundefsym 1 tocoff 0 ntoc 0 modtaboff 0 nmodtab 0 extrefsymoff 0 nextrefsyms 0 indirectsymoff 12312 nindirectsyms 1 extreloff 0 nextrel 0 locreloff 0 nlocrel 0 Load command 7 cmd LC_LOAD_DYLINKER cmdsize 28 name /usr/lib/dyld (offset 12) Load command 8 cmd LC_UUID cmdsize 24 uuid 0x7d 0xf6 0x04 0x33 0x2c 0xb5 0xc7 0x5c 0x3a 0x53 0xe1 0xd2 0x4b 0x5b 0xa3 0xac Load command 9 cmd LC_UNIXTHREAD cmdsize 80 flavor i386_THREAD_STATE count i386_THREAD_STATE_COUNT eax 0x00000000 ebx 0x00000000 ecx 0x00000000 edx 0x00000000 edi 0x00000000 esi 0x00000000 ebp 0x00000000 esp 0x00000000 ss 0x00000000 eflags 0x00000000 eip 0x00001f98 cs 0x00000000 ds 0x00000000 es 0x00000000 fs 0x00000000 gs 0x00000000 Load command 10 cmd LC_LOAD_DYLIB cmdsize 52 name /usr/lib/libgcc_s.1.dylib (offset 24) time stamp 2 Thu Jan 1 01:00:02 1970 current version 1.0.0 compatibility version 1.0.0 Load command 11 cmd LC_LOAD_DYLIB cmdsize 52 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 2 Thu Jan 1 01:00:02 1970 current version 111.0.0 compatibility version 1.0.0 Load command 12 cmd LC_CODE_SIGNATURE cmdsize 16 dataoff 12352 datasize 5232
So what do those commands do?
- The first command (0) sets up a memory segment called
__PAGEZERO
at address zero, with special permission. This is basically a facility that makes sures that NULL pointers result in an error. - The second command sets up the __TEXT segment with the read only binary code. The command tells the system which part of the binary file will be mapped at which address.
- The next command sets up the __DATA segment, this will contain the program’s initial writable memory. As everything needs to be aligned on memory page boundaries, this segment’s representation has to be stored on a different page in the file, i.e. start at offset 4096 even if there was space left before.
- The next command sets up the __IMPORT segment which contains the data structure for importing symbols from shared libraries. As this will be a different page it has to be on another page, we are now at offset 8192.
- The next command sets up the symbol table, this one does not need any representation in the file (but it can reference names in the __IMPORT section).
- This command specifies the dynamic symbol table, again no representation in the file.
- This command specifies what dynamic linker is used, here it is
/usr/lib/dyld
. - This command specifies a unique identifier for the binary.
- This command specifies the register state of the initial thread. Most of the registers are set to zero, except the instruction register (eip) which contains the entry point of the program.
- This command specifies a library to load
/usr/lib/libgcc_s.1.dylib
, this is the supporting library for gcc generated code. - This command specifies another library,
/usr/lib/libSystem.B.dylib
, this is the giant framework that contains most of the Unix libraries of OS X. - The last command specifies the signature for the code
The conclusion here is that the binary file is mostly filled with zeros so that the various segments fit within memory pages. I wrote a small tool that measures how much space is really used, and in the case of true, only 1151 bytes out of 17584 are used, that is 6.55%. I’ll talk about that tool another time…
In the case of a very simple program like true, would it be possible to strip most of these informations?
I’m working on this. You can remove the page zero segment by specifying -pagezero_size 0 to the linker. The dynamic linking information could be removed, as true actually does not use any linked function. Still the binary would be two pages (8K) one with the code and the header, and one with the stack. I’m trying to mesh them into the same page of the file (quite close), then you would need to minimize the code. A lot could be removed, and some initialization of register merged into the initial register state declaration – if you look at the assembly code, most of it is the __start initialization method inherited from the System framework.