Cluster Maintenance

今日は僕がClusterのmaintenanceやっています 難しいです。

Today I’m doing the cluster maintenance. It is tedious work

Aujourd’hui, je fais la maintenance du cluster, c’est du travail ennuyeux.

As minor updates do not seem to solve my performance problem, today I’m upgrading the cluster to Fedora Core 4. I always finds this kind of task tedious, because there a lot of phases where things can break, but at the same time, those upgrade use a lot of time, the result is a strange mix of nervousness and boredom.

The primary source of problems are kirks, that is, small things that are never critical, but consume a lot of time. Those a made worse when you don’t know the system very well, typically because somebody set it up. When you are used to a system, you know its weaknesses and avoid instinctively things that you know don’t work well or tend to jam, like you would know in a familiar house that a given door tends to jam. Simply preparing the media took me some time. First DVD burning does not work on the main linux server, so I had to transfer the disk image to laptop and burn it there, only to discover that actually, the cluster nodes cannot read DVD-R disk. So I had to find some blanks CDs. In found some in the dying media corner, along the floppy disks. Of course doing the install using 4 CD is much less convenient, as it means you have to stay around to swap CDs. Then came the “yum update” phase, which takes hours and gives no estimation of the time beforehand.

Next phase will be replicating the file-system of the first node to the others, I’m still pondering what technique to use. Péter suggested creating a tar and distribute it using NFS, but I’m wondering if I’m not better off creating a disk image for the two root and the boot filesystems and restoring them on the different nodes using a portable hard-drive.

