守破離

Shuhari and Computer Science

守破離

Shuhari is a term that describes the phases or learning in Japanese arts and is considered a part of some martial arts like Aikidō. Learning can be decomposed into three phases:

(SHU) – Conformance
Learning while conforming to the rules.
(HA) – Deviation
Developing one’s own style, breaking some rules
(RI) – transcendence
Building one’s own rules

While reading about the subject on the internet, I found an interesting blog post called The Fallacy of Shu-Ha-Ri, which discusses the application of that principle to computer science. It is not so much a criticism of the decomposition than a realisation that it cannot really be applied in a chaotic medium like computer science: what is the point of mastering some technique if you are not sure that said tool is the solution.

To me shuhari looks like a good model once you have accepted a given path. Unsurprisingly, Japanese arts are divided into ways ( []), early on, you choose a way and you follow it until you transcend it; not exactly the way things work in computer science.

Reflecting on my situation, there are a very few tools that have stayed around since I started dabbling with computers, having learnt the trade on Commodore 64 with BASIC and 6510 assembly. I learnt C programming in 1992, at the University of Geneva and I’m still coding in C++ these days, but the difference between the two languages is pretty large. In C++, I would say I’m at somewhere between SHU and HA: can’t say I’m fully mastering it, but sometimes bending some rules and using advanced features.

Flattr this!

An Onion

The poisonous onion…

An Onion

Abstraction levels is one of those nice ideas that is thrown around a lot in computer science. The idea is seducing: implement your system in layers, each one implementing some abstraction which simplifies the one above. The concept also resonates with other engineering branches: a car is built of abstract elements, the engine, the chassis, and all the details of the engine are not known to the chassis builder.

This is true to a certain extent, but an engine leaks many implementation details: its shape, weight, the position of the input, exhausts, cams, how it vibrates and how much heat it produces. Change one core assumption (that it is a combustion engine) and you end up with a pretty bad design.

Adding abstractions layers in computer science is very easy, so easy that it is the very first thing many programmers do: if you do not like the way the underlying platform looks, change it into something you like. Problem is, very often this does not achieve much in terms of functionality, while in theory a more abstract system permits to change the underlying implementation, this only makes sense if you are the abstraction can and will be changed in some way – i.e. you not only need two implementations, but also two use-cases or applications that need different implementations. Very often this is not the case.

Because of this, software tends to be composed of multiple strata of forgotten and hidden code, with a given feature implemented multiple time, in different layers. One common story you find in computer science is one of a some code that uses a database connection to retrieve items, with filtering applied to output of the database, instead of doing the filtering using the database’s language. This is typical because it highlight one core problem of abstractions, it hides whatever is underneath, but you still need to understand the abstraction; if you don’t understand an abstraction, you are bound to re-implement it, one layer above.

This happens to individual programmers, but also in complete systems. For instance various features linked to data representation have bubbled up in the abstraction stack. In a Unix system, you can open a file in two modes, binary or text. Binary files are what you would expect: sequences of bytes, while text files were supposed to be a level of abstraction: the operating system would handle the translation from and to whatever format it used to store text, giving you the illusion the system would handle ASCII. This abstraction is dead: the mode parameter under any modern Unix makes no difference whatsoever, even though the files might use different text encodings: ISO-latin, UTF-8, Shift-JIS. These differences are not supported by the abstraction.

The irony is that traditional Unix uses text-file for everything, yet even though ASCII contains control characters to separate records and fields, these characters are dead, because text editors cannot handle them. So a good fraction of ASCII is dead. Instead record separation was inside a text format, with textual symbols used to separate fields and records, so we have the mess that is CSV.

Oignon © AntoineCreative Commons Attribution-Share Alike 3.0 Unported

Flattr this!

Setting Titles

./title.py 'badger badger 🍄'

While playing around with terminal escape sequences, I realised that it is possible to set the window title in both xterm like terminals (Mac OS X’s terminal falls into that category) and screen like terminal emulators (tmux, I’m currently switching to, falls into that category). Sadly the escape sequences are different (nothing is simple), and don’t seem standard enough to be part of Python curses library.

So I wrote a very simple command that sets the title according to the current terminal.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import os

def xterm_title(title):
  if not sys.stdout.isatty():
    return
  sys.stdout.write('\x1b]0;' +  title + '\a')

def screen_title(title):
  if not sys.stdout.isatty():
    return
  sys.stdout.write('\x1bk' + title + '\x1b\\')

def main(argv):
  term = os.getenv('TERM')
  text = ' '.join(argv)
  if ('xterm' in term):
    xterm_title(text)
  elif ('screen' in term):
    screen_title(text)


if __name__ == "__main__":
    main(sys.argv[1:])

If you wonder why I’m using \x1b to represent the escape character, the answer is simple: Python does not define \e.

Flattr this!

wiesmann@wagamama~> ping 8.8.8.8 | big.py PING 8.8.8.8 (8.8.8.8): 56 data bytes PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from 8.8.8.8: icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=54 time=13.067 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=54 time=13.067 ms

Python Woes – Stdin

wiesmann@wagamama~> ping 8.8.8.8 | big.py PING 8.8.8.8 (8.8.8.8): 56 data bytes PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from 8.8.8.8: icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=54 time=13.067 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=54 time=13.067 ms

One important aspect of a good programming language is that the effect of code should be predictable. I wanted to write a small utility program that takes the its input from stdin and transforms it to make it bigger on stdout. Typically the kind of quick hacking you could do in Python. So I wrote the following code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys

def main(argv):
  for line in sys.stdin:
    line = line.rstrip()
    sys.stdout.write('\x1b#3')
    sys.stdout.write(line)
    sys.stdout.write('\n\x1b#4')
    sys.stdout.write(line)
    sys.stdout.write('\n')

if __name__ == "__main__":
    main(sys.argv[1:])

This worked fine with commands which terminate (and close the file descriptor) but failed with continuous commands like ping: the code blocks and seems to wait for the stdin descriptor to be closed. It is not just a buffering issue, because eventually ping would fill up whatever buffer.

I tried to turn off Python input buffering with the -u command line flag, no change. Switching from the implicit iterator to using readlines(80), i.e setting an explicit buffer size did not solve the issue either. So I ended up rewriting the code the following way:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys

def main(argv):
  while True:
    line = sys.stdin.readline()
    if (not line):
      break  # EOF = empty line
    line = line.rstrip()
    sys.stdout.write('\x1b#3')
    sys.stdout.write(line)
    sys.stdout.write('\n\x1b#4')
    sys.stdout.write(line)
    sys.stdout.write('\n')
    sys.stdout.flush()

if __name__ == "__main__":
    main(sys.argv[1:])

The fact that there is a functional different between calling readlines() or repeatedly calling readline() and detecting end of file is really not intuitive, and typically the type of black magic that languages should avoid. If python files were generators, then this mess would not be there.

Flattr this!

The mad programmer strikes again

Control Codes

The mad programmer strikes again

I have a fascination for old computing standards, in particular the ones that are not completely dead, but still present in present computers, but largely forgotten. One such standard are ANSI escape sequences, which enable rich features in terminal, like coloured text. One aspect of these standards I did not understand were C1 control codes, i.e. control characters in the above code x7F, that is, 8 bit control codes.

Even in today’s Unicode standard character codes 0 to x20 (C0) and x7F to x9F (C1) are reserved for control codes. Except for four code point: space, carriage return, line feed and form feed, the usage of C0 control codes is not recommended by RFC 5198 and the usage of the C1 codes is explicitly forbidden. If you consider how UTF-8 works, this is a shame: 22% of the code-points that can be expressed using a single byte are unused, so are 1.6% of the code-points that can be expressed using two bytes.

While most of the C0 code are related to controlling devices and line parameters, the C1 ones seem more esoteric, they are mostly related to specifications that have died out. The only one that feels vaguely relevant is CSI (Control Sequence Introducer), which sounds like what ANSI would use to send control sequences. Then I realised that there is a thing called 7-Bit Code Extension Technique which basically lets a C1 code be encoded using 7 bit characters. This is done with the sequence ␛ followed by C1 – 0x40. So CSI becomes ␛[, which is basically the preamble of most ANSI sequences.

On Mac OS X, with Terminal.app set up in Unicode/UTF-8 mode, C1 codes are not active, which makes sense, as the only advantage of using C1 codes over their 7 bit extension equivalent is that it saves a byte, except in the UTF-8 Encoding where both use two bytes.

Still I was curious to see what ANSI features are supported by the terminal in Mac OS. One good way to do this is to download vttest and run the various tests. When I was at the university, I used some real VT-220 terminals, so I expected the graphical features of these devices to mostly work. I found that the OS X terminal supports the following features:

  • Basic terminal text styling: bold, blink, underline, reverse video.
  • 256 colours.
  • Double sized text (first time I realised this feature was present).
  • dtterm window manipulation: move, resize, change title.

The way the double size text is implemented is pretty neat: ␛#3 starts a mode where the upper half of the double size text is printed, ␛#4 starts a mode where the lower half of the double size text is printed. So this means that if the code is not supported, you get twice the text instead of single big text. There is also a double height, single width mode with codes 1 and 2, but this not supported by the OS X’s terminal nor by xterm.

Flattr this!

Whiny Apps

Nag screen – facebook messenger

A UK regulatory body has declared that calling the new Dungeon Keeper app free was misleading, indeed the game cannot be played (in the sense of fun) without paying. The people who develop software want to be paid in the end, so if the app is free, the money is going to come from somewhere else: support services, in-app purchases, ads or just monetizing the data.

Basically Free apps are behaving like people who have no money: they beg and nag. The first rule of begging is to never accept no like an answer. If it sounds very close to harassment, it is, the user can’t say no, the best she can do is either say not now, or just close and ignore the interstitial.

Facebook messenger is particularly annoying in this respect, as it regularly shows me how to enable notifications. Guess what: I know how to do it, and I have choosen, to block the permissions for this app; that corporate communication that pretends to be helpful annoys me more than anything. Imagine a guy explaining a girl how the phone works because she won’t give him her number…

Nag Screen – Twitter

Unsurprisingly, users hate this, and this nagging is a good way to torpedo down usability: while I was a huge fan of dungeon keeper and I would have paid money to play it on a tablet (I paid money to play it again on good old games), I uninstalled electronic’s art horrible sequel in a few minutes. Same song for plants vs. zombies, I loved the first game, bought it both for Mac OS and iOS, I played the sequel on my phone a bit and uninstalled it, too much nagging, too much changing, the game felt like a TV show, and for people outside of the US, this is bad.

I recently switched fitness apps because of nagging and bugs and increasingly I prefer silent apps that cost a bit of money to free noisy apps. Most of the apps I run now are constrained in their permission in one way or another: most of them don’t really need mobile data, location or contact address they would just like to get to the data and do a bit of tracking. Tough…

Flattr this!