Python Logo

Python Woes – Boolean Casting

Python Logo

The C language has the strange feature that it does not really distinguish between integer, pointers and booleans, True is any integer value different from zero and False is zero. Nowadays it is considered bad form to (over)use that feature.

Python has a feature which is similar: most build-in types evaluate to a boolean, like in C, a-non zero number is considered true. The feature extends to collections: any non empty collection is true, an empty collection is false. Any class can implement this behaviour by implementing the __nonzero__ or __len__ if it is a collection. It is considered pythonic to use the implicit boolean evaluation instead of checking if an integer is zero or a collection is empty.

Another important aspect of Python is duck typing, i.e. having code that does not make strong assumptions on what types it gets as an input, if it quack like a duck, treat it as a duck. This typically means that a function that takes in a collection does not need to worry about which type of collection it gets: it can be a tuple, a list, a set. All built-in collection have a constructor that take another collection, so you can freely convert between the collections.

Python also has object that behave like collections: files, iterators, generators. You can use them instead of a collection in a loop, with an in statement and of course cast them into a list. There is one hitch: while that casting preserves the boolean property for real collections, it does not for these quasi-collections.

e = itertools.repeat(0, 0)
bool(e) → True
bool(list(e)) → False

The same thing happens with an existing, empty file. This is pretty annoying because this means that the boolean operator does not return the answer to the question does this thing contain elements, instead it answer a more convoluted question is this thing a pseudo collection or if it is a real collection, does it contain any elements, so basically, to know what this operator means, you need to know its type, which goes against the whole idea of duck typing. This goes against the intuition that the boolean property represents some high-level property…

It’s worth noting that you have a similar problem with numbers:

bool(1) → True
bool(int(True)) → True
bool(int(False)) → False
bool(0.5) → True
bool(int(0.5)) → False

Flattr this!

Tmux with a window title set to 'badger badger 🍄'

Setting Titles

./ 'badger badger 🍄'

While playing around with terminal escape sequences, I realised that it is possible to set the window title in both xterm like terminals (Mac OS X’s terminal falls into that category) and screen like terminal emulators (tmux, I’m currently switching to, falls into that category). Sadly the escape sequences are different (nothing is simple), and don’t seem standard enough to be part of Python curses library.

So I wrote a very simple command that sets the title according to the current terminal.

# -*- coding: utf-8 -*-

import sys
import os

def xterm_title(title):
  if not sys.stdout.isatty():
  sys.stdout.write('\x1b]0;' +  title + '\a')

def screen_title(title):
  if not sys.stdout.isatty():
  sys.stdout.write('\x1bk' + title + '\x1b\\')

def main(argv):
  term = os.getenv('TERM')
  text = ' '.join(argv)
  if ('xterm' in term):
  elif ('screen' in term):

if __name__ == "__main__":

If you wonder why I’m using \x1b to represent the escape character, the answer is simple: Python does not define \e.

Flattr this!

wiesmann@wagamama~> ping | PING ( 56 data bytes PING ( 56 data bytes 64 bytes from icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from icmp_seq=8 ttl=54 time=13.067 ms 64 bytes from icmp_seq=8 ttl=54 time=13.067 ms

Python Woes – Stdin

wiesmann@wagamama~> ping | PING ( 56 data bytes PING ( 56 data bytes 64 bytes from icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from icmp_seq=8 ttl=54 time=13.067 ms 64 bytes from icmp_seq=8 ttl=54 time=13.067 ms

One important aspect of a good programming language is that the effect of code should be predictable. I wanted to write a small utility program that takes the its input from stdin and transforms it to make it bigger on stdout. Typically the kind of quick hacking you could do in Python. So I wrote the following code:

# -*- coding: utf-8 -*-

import sys

def main(argv):
  for line in sys.stdin:
    line = line.rstrip()

if __name__ == "__main__":

This worked fine with commands which terminate (and close the file descriptor) but failed with continuous commands like ping: the code blocks and seems to wait for the stdin descriptor to be closed. It is not just a buffering issue, because eventually ping would fill up whatever buffer.

I tried to turn off Python input buffering with the -u command line flag, no change. Switching from the implicit iterator to using readlines(80), i.e setting an explicit buffer size did not solve the issue either. So I ended up rewriting the code the following way:

# -*- coding: utf-8 -*-

import sys

def main(argv):
  while True:
    line = sys.stdin.readline()
    if (not line):
      break  # EOF = empty line
    line = line.rstrip()

if __name__ == "__main__":

The fact that there is a functional different between calling readlines() or repeatedly calling readline() and detecting end of file is really not intuitive, and typically the type of black magic that languages should avoid. If python files were generators, then this mess would not be there.

Flattr this!

Python Logo


Python Logo

Bugs in software have many sources, but ones caused by erroneous assumptions are among the most difficult to find. The problem is that these assumptions are baked in the code, so the reader will tend to take them in at the same time as the code.

Consider the following code that creates some simple histogram of input values. It seems reasonable to assume that the resulting dictionary will have a limited size, with at most steps entries. This assumption is incorrect.

def bucket(values, min, max, steps):
  r = max - min
  assert r > 0
  steps = int(steps)
  assert steps > 0
  d = float(steps) / r
  result = collections.defaultdict(int)
  for v in values:
    if v < min:
      v = min
    elif v > max:
      v = max
    k =  math.floor((v - min) * d) / d + min
    result[k] += 1
  return result

This code can return dictionaries which are much larger than steps entries. The problem here lies in the clamping logic. The assumption here is that a value that is not smaller than min and not larger than max is a value in the range ⟦minmax⟧.

This assumption is incorrect in Python (and probably many other languages), because in floating point, there is NaN. Now NaN in Python has many properties that break the assumptions of the code above:

  • Any floating point operation that has NaN as an argument returns NaN.
  • float('NaN') < XFalseX
  • float('NaN') > XFalseX
  • float('NaN') == float('NaN')False

The last property means that using floating point values as keys in dictionaries is a bad idea, as the equality operator is used to determine if two keys are the same. This means that you while you can add a value with key NaN, you can never access it by key, because that key is not equal with itself. So you can also add multiple values with the same key, and they will each have a different entry in the dict:

d[float('NaN')] = 4
d[float('NaN')] = 5 
d → {nan: 4, nan: 5}

Python will let you use anything as a dictionary key, even if the key is mutable, or does not implement the equality operator properly. So if you feed the following input to the bucket function above, you will a return dictionary with 1000 entries, regardless of the value of the step parameter.

v = itertools.repeat(float('nan'), 1000)

Flattr this!

Swift Logo

A first look at Swift

Swift Logo

Apple’s announcement of the Swift language was quite a surprise, while the company had extended Objective-C in the past, it had not dabble in programming languages since Dylan and Applescript, so I was quite curious to see what this is about.

Swift is presented as the successor of Objective-C, but drops the C compatibility while keeping a C-like syntax, in a way similar to Go or Rust, Swift borrows ideas from many existing languages: Objective-C, Python, but also Rust and C♯. Compared to say Python, the language is actually pretty complex, as it has many features. Gone are the days were languages tried to be minimalistic, here the goal is clearly to implement features for common programming patterns.

Given the fact these day I mostly code in C++11 and Python 2.7, I found the language to have interesting features:

  • Compiled language, with the goal of being faster than Objective-C. Uses LLVM as compilation back-end.
  • References and Value types. Swift borrows from C♯ that classes are reference types and structs are value types, passed by copy to functions and methods. This means that complex data can live on the stack and be contiguous in collections. This makes a big performance difference and is also a big theme in C++11 and C++14.
  • Strong typing with type inference, with a differentiation between variable and constant, one declared with var, the other with let, no duck typing for functions, but the language supports templates and protocols (interfaces). Types can be extended outside of the original declaration and this can be used to make them adopt new protocols.
  • Enum types can be based on any raw type (not only integers), and can have associated data depending on the enum value, so they also implement the functionality of union.
  • Switch statement based on any types, with the complex matching rules, so you can branch on ranges, or conditional expressions. Something that the Rust language has.
  • Optional types everywhere. Swift has no pointers, and no magic None/Nil value like Python has, Conditional types express the idea of something of type X or nothing, a feature you would implement using a pointer and check for null, this is the same as the boost:optional template in C++, but more integrated into the language, as there is an optional access operator.
    Consider an object A which has a field b which can contain a value of type B which has a field c, which can contain a value of type C. If you have an value a of type A and you just want to read the field a.b.c, you have to add a lot of if statements, or try accessing it and handle exceptions. In Swift you just call a?b?c which returns you can optional of type C. This is really nice because this kind of access is pretty common (for instance in protocol buffers).
  • Lot of syntactic sugar: nice loops, tuple decomposition, i.e let (x, y) = getCoordinates(), clean range expression [1..9], integers can contain underscore to be more legible like 1_000_000, string interpolation can contain arbitrary expressions (including function calls) "sin \(x) = \(sin(x))"

We will see how the language performs in practice, I suspect its impact will mostly be determined by the way Apple releases the language, if it is free, it might get wider adoption. Regardless of the success of Swift, I think many of the ideas in the language are good ones, and so I hope this will lend support to the languages which already have those features, and encourage other languages to adopt them.

Flattr this!

Check IO Capture / You are Here


Check IO Capture / You are Here

I have spent the last few days playing on CheckIO, enough to reach level eight – you need to be logging in and have reached a given level to see another user’s page. While one could consider this site to be a game, its puzzles are basically small programming problems one has to solve using code written in Python. Some of them are classics: sorting numbers, producing roman numerals, others are more esoteric, I’m always impressed by the stories that people create so they can use the Fibonacci sequence in code.

CheckIO’s web interface contains a small editor with syntax highlighting and the ability to run python code, both in 2.7 and 3.3 variants. So you can run your solution and then check it against the official validation tests. Once you solve a puzzle, you can publish it and discuss the solution of various people. Thanks to this, I have already learnt a few python tricks I did not know.

So if you have minimal python skills and like coding puzzles, this is a really good site.

Flattr this!

Data URI Script

Sometimes you want to provide a small data file for example purposes, but uploading it somewhere is a hassle. One way around this is to use the data URI protocol defined in . I have written a quick python script that converts a short file into data URI. You can download the program from this .


import base64
import mimetypes
import os
import sys

def main():
  if len(sys.argv) < 2:
    sys.stdout.write('usage %s \n' % sys.argv[0])
  with open(sys.argv[1]) as input_handle:
    data =
    type = mimetypes.guess_type(sys.argv[1])[0]
    encoded = base64.urlsafe_b64encode(data)
    print 'data:%s;charset=utf-8;base64,%s' % (type, encoded)

if __name__ == '__main__':

Flattr this!

Python Logo

A script to change network location automatically

Python Logo

As I mentioned earlier in this blog, I run a squid proxy in my home network. I’m using Mac OS X’s location feature to have two settings, a default on without proxy, and my home network with some customisations, including the proxy setting. Of course, when I move around I always forget to switch. As I always have a shell terminal open, I wanted a command that just fixes the issue by looking at the current wifi setting. There are numerous apps trying to solve the same problem on the web, they are usually way to complicated for my taste: adding icons to the menu bar, running on a permanent basis and generally making a nuisance of themselves. I was kind of surprised this feature was not added to Mac OS, as iOS solves the problem more elegantly: proxy settings are associated with the SSID.

The good news is that there are two command-line tools that provide the needed information. The first is
/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport which can be used to return information about the current 802.11 network. The second is
/usr/sbin/scselect which can be used to select the current location. So I wrote a small Python script that uses the information from the first to configure the latter. The configuration is simply the map in the beginning of the file, which contains the relationship between Network name and location name. You can download it here

Flattr this!

Floating point considered harmful

bool foo() {
  float a = 0.6;
  float b = 6.0;
  return (a == (b * 0.1));

bool bar(float c) {
  return (c != c);

When I learned coding on the C64, the two main tools where Basic, and direct memory access. Any advanced stuff was done in assembly. So basically, coding involved pretty much only pointer arithmetics and goto statements. Of course this was a long time ago and such constructs are now shunned, and many newer languages even prevent the code from doing such thing, which is probably a change for the better.

Strangely, there is one type of operations that is very dangerous, but which most languages do not restrict: floating point operations. Consider the snippet on the side, will foo return true? Maybe, but there is no guarantee that the bit representations of a and b will be equal. Will bar ever return true? Yes if c is NaN. Generally speaking, comparing bit for bit two floating points is a bad idea, and providing that operation to the programmer is a disservice. Floating point semantics are complicated, and only partially visible in the programming language: rounding, extended precision are typically not accessible.

A lot of the code I have seen just happily assumes that floats basically behave like integers, why would it not? Floating point numbers are the only types which in each and every language are allowed to have the same operators as integer. In most of the cases, using a fixed point (currencies) or fractional representation (computing averages and such) would be more appropriate, but such constructs are typically missing in the language, or are second class citizens – Python for instance only had fractions since version 2.6 and there is no shorthand notation to create them.

The only languages I used which had put some serious thoughts in their numerical representations where Ada and Smalltalk. The others seem to be happy to just clone C operations.

Flattr this!

Roman Numeral Efficiency

How efficient are roman numerals?

Roman numbers are one of those legacy system that hang around without ever really disappearing. Roman numbers are difficult to parse, and their length varies at lot. The mystery sequence in my previous post was just that: the length of roman numeral n for each value of n. The question I was wondering is: how inefficient is the roman numeral system? The decimal basically needs log10(n) symbols to represent number n, the same number in binary format will need log2(n). My intuition would be that the roman system is less efficient (in terms of symbols needed to represent a value) than the decimal system, but better than the binary representation.

So I wrote a small program to compute how many chars would be needed to represent a given number.

The graph shows the various number system, each point is the mean number of characters needed to represent the values in the range [2i … 2i+1[. Obviously the binary representation is exactly linear. The decimal notation is also linear, with steps each time a power of 10 is crossed. The roman notation is also linear up to 218 where the curves goes up: the reason is simple, there is no symbol for values above 10000 (ↈ). Up until that point roman numeral seem to be as efficient as a ternary notation.

To generate those numbers, I created a small python script to generate roman numerals. For the sake of consistency, I chose to only generate characters from the unicode roman numeral range (x2160 to x2188). One interesting thing in this range is that it contains hybrid characters, like Ⅷ (x2167) which represent the roman numeral for eight in one single character. Using that range we get slightly different curve, the asymptotic behaviour is not changed, but numbers are generally shorter by two characters and for small numbers, the system is now as efficient as arabic numerals.

Here is the python script that generates the shorter roman numerals using the hybrid characters.

  (100000, u'ↈ'),
  (90000, u'ↂↈ'),
  (50000, u'ↇ'),
  (40000, u'ↂↇ'),
  (10000, u'ↂ'),
  (9000, u'Ⅿↂ'),
  (5000, u'ↁ'),
  (4000, u'Ⅿↁ'),
  (1000, u'Ⅿ'),
  (900, u'ⅭⅯ'),
  (500, u'Ⅾ'),
  (400, u'ⅭⅮ'),
  (100, u'Ⅽ'),
  (90, u'ⅩⅭ'),
  (50, u'Ⅼ'),
  (40, u'ⅩⅬ'),
  (20, u'ⅩⅩ'),  # to avoid 24 = 'ⅫⅫ'
  (12, u'Ⅻ'),
  (11, u'Ⅺ'),
  (10, u'Ⅹ'),
  (9, u'Ⅸ'),
  (8, u'Ⅷ'),
  (7, u'Ⅶ'),
  (6, u'Ⅵ'),
  (5, u'Ⅴ'),
  (4, u'Ⅳ'),
  (3, u'Ⅲ'),
  (2, u'Ⅱ'),
  (1, u'Ⅰ'),

def _roman(n):
  for value, text in NUMERALS:
    times, n = divmod(n, value)
    yield text * times
    if not n:

def roman(n):
  return u''.join(_roman(n))

Flattr this!