Compatibility is a recurring theme in computer science, a myth that would exist if programmers followed the rules. I was playing around with an emulated Mac running system 7.5, and as soon as I had a running TCP/IP stack, I ran an old web browser and pointed it to my blog. The web browser was Mosaic 2.0.1, created in 1995. So there are 17 years separating the client and the server.
The good news is, the page loads. The bad news is, the page content is pretty broken. The HTML language was designed so that the display would gracefully degrade in browsers that do not support all features, but this page involves three incompatible changes:
- PNG images
- UTF-8 encoding
The switch from the GIF to PNG image format was caused by patent issues, not much to say here, excepted that patents are a nuisance. The change of character encoding was necessary because the initial design of HTML was broken, it was largely inspired by the LaΤeχ typesetting and so would represent non ASCII characters as escape sequences (at least in HTML the name-spaces are clearly separated), still this required each and every non-ascii character to be named and escaped, this made a system that was supposed to be human readable unreadable for most languages on the planet.
As people soon figured out that this system would not work, support for the various platform and country specific encodings was added. We have not finished fixing this mess, and I regularly encounter bugs caused by people who still use one of those encodings (usually iso-latin-1). That is really a shame, because the first draft of Unicode was published in 1988.
The interesting thing is that while things broke at the display level, at the connection level everything ran fine, Mosaic could even connect to the proxy that runs on my NAS. Does this mean that the low level stuff was better designed that the presentation layer? I’m not really sure. The HTTP 1.1 is a very simple protocol. While simplicity is a nice property in a protocol, if the protocol is simple because all the complexity was pushed to the upper layers, then the protocol is not elegantly simple, it is just not doing its job. Some people call this New Jersey design style.
The HTTP is simple because it is a thin stateless wrapper around TCP/IP. So all the issues that are not handled in the lower layer (where they should be) have to be handled in the application layer: state (cookies), multiplexing (
data: protocol), cryptographic exchange (
XMLHttpRequest), application level routing, etc. The protocol makes it very easy to write a web server is three lines of shell script, and awfully hard to make a scalable web-server.
Recently they have been talks about a new version of the HTTP protocol, and I agree with Poul-Henning Kamp that the current proposal are not very interesting. They fix a few hedge cases and improve performance by just hacking the protocol further. But at least an old web browser will still be able to connect…