The poisonous onion…

An Onion

Abstraction levels is one of those nice ideas that is thrown around a lot in computer science. The idea is seducing: implement your system in layers, each one implementing some abstraction which simplifies the one above. The concept also resonates with other engineering branches: a car is built of abstract elements, the engine, the chassis, and all the details of the engine are not known to the chassis builder.

This is true to a certain extent, but an engine leaks many implementation details: its shape, weight, the position of the input, exhausts, cams, how it vibrates and how much heat it produces. Change one core assumption (that it is a combustion engine) and you end up with a pretty bad design.

Adding abstractions layers in computer science is very easy, so easy that it is the very first thing many programmers do: if you do not like the way the underlying platform looks, change it into something you like. Problem is, very often this does not achieve much in terms of functionality, while in theory a more abstract system permits to change the underlying implementation, this only makes sense if you are the abstraction can and will be changed in some way – i.e. you not only need two implementations, but also two use-cases or applications that need different implementations. Very often this is not the case.

Because of this, software tends to be composed of multiple strata of forgotten and hidden code, with a given feature implemented multiple time, in different layers. One common story you find in computer science is one of a some code that uses a database connection to retrieve items, with filtering applied to output of the database, instead of doing the filtering using the database’s language. This is typical because it highlight one core problem of abstractions, it hides whatever is underneath, but you still need to understand the abstraction; if you don’t understand an abstraction, you are bound to re-implement it, one layer above.

This happens to individual programmers, but also in complete systems. For instance various features linked to data representation have bubbled up in the abstraction stack. In a Unix system, you can open a file in two modes, binary or text. Binary files are what you would expect: sequences of bytes, while text files were supposed to be a level of abstraction: the operating system would handle the translation from and to whatever format it used to store text, giving you the illusion the system would handle ASCII. This abstraction is dead: the mode parameter under any modern Unix makes no difference whatsoever, even though the files might use different text encodings: ISO-latin, UTF-8, Shift-JIS. These differences are not supported by the abstraction.

The irony is that traditional Unix uses text-file for everything, yet even though ASCII contains control characters to separate records and fields, these characters are dead, because text editors cannot handle them. So a good fraction of ASCII is dead. Instead record separation was inside a text format, with textual symbols used to separate fields and records, so we have the mess that is CSV.

Oignon © AntoineCreative Commons Attribution-Share Alike 3.0 Unported

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.