Migration

So I have migrated this blog to another host. The DNS entry thias.absyrde.net should eventually be updated to point to the new machine, but in the meantime, please update your bookmarks to wiesmann.codiferes.net. This move was long overdue the previous hosting solution had an old version of MySQL which meant I could simply not upgrade WordPress to a new version. It is now hosted on a machine that I rent with some friends, this should hopefully be a more flexible solution.

Migrating the blog to a new host should have been trivial, of course it was not. The core problem was the usual weak point of open-source software: support for internationalization. While my WordPress instance was using the UTF-8 encoding, the underlying database, MySQL 3, is setup by default to use the Latin-1 encoding and a Norvegian sort ordering. Between my posts with Kanji, spam in Cyrillic and one plugin saving caches in binary, the database dumps where simply not valid UTF-8 files.

The solution to this problem was the usual: surf the web for hours trying to find a solution, from useless forum posts, to dead links that might have contained the answer. In the end I fixed the problem with the tools on my laptop: grep and TextWrangler. I purged all the spam, removed all the entries from falbum (this involved doing a dychotomic search on the dump file to find which lines were breaking the encoding) and forced the destination MySQL instance to create the database in UTF-8 and loaded the database dump by specifying the encoding again. You would think that all these tools would honor a lang setting like en_US.UTF-8, you would be wrong.

3 thoughts on “Migration”

  1. Well, we were speaking about MySQL 3 here, so it was a ancient database, with much of the default setting. Fun. When you say “The core problem was the usual weak point of open-source software”, I would like to correct you, this is not a problem proper to open source software. You would probably not too surprised to learn that Windows use by default its own encoding, which will appear in SQL Server too, that if you do not set the locales of Oracle, it will not support Unicode by default and limit you to the american character subset, etc. etc.

    Character encoding is _THE_ problem that computer science strive to solve but cannot because there are “better solution” or “nih-syndrom”.

  2. Computer science is like dogs, it ages more faster than humans.

Leave a Reply to edomaurCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.