Swift Logo

A first look at Swift

Swift Logo

Apple’s announcement of the Swift language was quite a surprise, while the company had extended Objective-C in the past, it had not dabble in programming languages since Dylan and Applescript, so I was quite curious to see what this is about.

Swift is presented as the successor of Objective-C, but drops the C compatibility while keeping a C-like syntax, in a way similar to Go or Rust, Swift borrows ideas from many existing languages: Objective-C, Python, but also Rust and C♯. Compared to say Python, the language is actually pretty complex, as it has many features. Gone are the days were languages tried to be minimalistic, here the goal is clearly to implement features for common programming patterns.

Given the fact these day I mostly code in C++11 and Python 2.7, I found the language to have interesting features:

  • Compiled language, with the goal of being faster than Objective-C. Uses LLVM as compilation back-end.
  • References and Value types. Swift borrows from C♯ that classes are reference types and structs are value types, passed by copy to functions and methods. This means that complex data can live on the stack and be contiguous in collections. This makes a big performance difference and is also a big theme in C++11 and C++14.
  • Strong typing with type inference, with a differentiation between variable and constant, one declared with var, the other with let, no duck typing for functions, but the language supports templates and protocols (interfaces). Types can be extended outside of the original declaration and this can be used to make them adopt new protocols.
  • Enum types can be based on any raw type (not only integers), and can have associated data depending on the enum value, so they also implement the functionality of union.
  • Switch statement based on any types, with the complex matching rules, so you can branch on ranges, or conditional expressions. Something that the Rust language has.
  • Optional types everywhere. Swift has no pointers, and no magic None/Nil value like Python has, Conditional types express the idea of something of type X or nothing, a feature you would implement using a pointer and check for null, this is the same as the boost:optional template in C++, but more integrated into the language, as there is an optional access operator.
    Consider an object A which has a field b which can contain a value of type B which has a field c, which can contain a value of type C. If you have an value a of type A and you just want to read the field a.b.c, you have to add a lot of if statements, or try accessing it and handle exceptions. In Swift you just call a?b?c which returns you can optional of type C. This is really nice because this kind of access is pretty common (for instance in protocol buffers).
  • Lot of syntactic sugar: nice loops, tuple decomposition, i.e let (x, y) = getCoordinates(), clean range expression [1..9], integers can contain underscore to be more legible like 1_000_000, string interpolation can contain arbitrary expressions (including function calls) "sin \(x) = \(sin(x))"

We will see how the language performs in practice, I suspect its impact will mostly be determined by the way Apple releases the language, if it is free, it might get wider adoption. Regardless of the success of Swift, I think many of the ideas in the language are good ones, and so I hope this will lend support to the languages which already have those features, and encourage other languages to adopt them.

flattr this!

Check IO Capture / You are Here

Checkio

Check IO Capture / You are Here

I have spent the last few days playing on CheckIO, enough to reach level eight – you need to be logging in and have reached a given level to see another user’s page. While one could consider this site to be a game, its puzzles are basically small programming problems one has to solve using code written in Python. Some of them are classics: sorting numbers, producing roman numerals, others are more esoteric, I’m always impressed by the stories that people create so they can use the Fibonacci sequence in code.

CheckIO’s web interface contains a small editor with syntax highlighting and the ability to run python code, both in 2.7 and 3.3 variants. So you can run your solution and then check it against the official validation tests. Once you solve a puzzle, you can publish it and discuss the solution of various people. Thanks to this, I have already learnt a few python tricks I did not know.

So if you have minimal python skills and like coding puzzles, this is a really good site.

flattr this!

Data URI Script

Sometimes you want to provide a small data file for example purposes, but uploading it somewhere is a hassle. One way around this is to use the data URI protocol defined in . I have written a quick python script that converts a short file into data URI. You can download the program from this .

#!/usr/bin/python

import base64
import mimetypes
import os
import sys

def main():
  if len(sys.argv) < 2:
    sys.stdout.write('usage %s \n' % sys.argv[0])
    sys.exit(os.EX_USAGE)
  with open(sys.argv[1]) as input_handle:
    data = input_handle.read()
    type = mimetypes.guess_type(sys.argv[1])[0]
    encoded = base64.urlsafe_b64encode(data)
    print 'data:%s;charset=utf-8;base64,%s' % (type, encoded)

if __name__ == '__main__':
  main()

flattr this!

A script to change network location automatically

Python Logo

As I mentioned earlier in this blog, I run a squid proxy in my home network. I’m using Mac OS X’s location feature to have two settings, a default on without proxy, and my home network with some customisations, including the proxy setting. Of course, when I move around I always forget to switch. As I always have a shell terminal open, I wanted a command that just fixes the issue by looking at the current wifi setting. There are numerous apps trying to solve the same problem on the web, they are usually way to complicated for my taste: adding icons to the menu bar, running on a permanent basis and generally making a nuisance of themselves. I was kind of surprised this feature was not added to Mac OS, as iOS solves the problem more elegantly: proxy settings are associated with the SSID.

The good news is that there are two command-line tools that provide the needed information. The first is
/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport which can be used to return information about the current 802.11 network. The second is
/usr/sbin/scselect which can be used to select the current location. So I wrote a small Python script that uses the information from the first to configure the latter. The configuration is simply the map in the beginning of the file, which contains the relationship between Network name and location name. You can download it here

flattr this!

Floating point considered harmful

bool foo() {
  float a = 0.6;
  float b = 6.0;
  return (a == (b * 0.1));
}

bool bar(float c) {
  return (c != c);
}

When I learned coding on the C64, the two main tools where Basic, and direct memory access. Any advanced stuff was done in assembly. So basically, coding involved pretty much only pointer arithmetics and goto statements. Of course this was a long time ago and such constructs are now shunned, and many newer languages even prevent the code from doing such thing, which is probably a change for the better.

Strangely, there is one type of operations that is very dangerous, but which most languages do not restrict: floating point operations. Consider the snippet on the side, will foo return true? Maybe, but there is no guarantee that the bit representations of a and b will be equal. Will bar ever return true? Yes if c is NaN. Generally speaking, comparing bit for bit two floating points is a bad idea, and providing that operation to the programmer is a disservice. Floating point semantics are complicated, and only partially visible in the programming language: rounding, extended precision are typically not accessible.

A lot of the code I have seen just happily assumes that floats basically behave like integers, why would it not? Floating point numbers are the only types which in each and every language are allowed to have the same operators as integer. In most of the cases, using a fixed point (currencies) or fractional representation (computing averages and such) would be more appropriate, but such constructs are typically missing in the language, or are second class citizens – Python for instance only had fractions since version 2.6 and there is no shorthand notation to create them.

The only languages I used which had put some serious thoughts in their numerical representations where Ada and Smalltalk. The others seem to be happy to just clone C operations.

flattr this!

How efficient are roman numerals?

Roman numbers are one of those legacy system that hang around without ever really disappearing. Roman numbers are difficult to parse, and their length varies at lot. The mystery sequence in my previous post was just that: the length of roman numeral n for each value of n. The question I was wondering is: how inefficient is the roman numeral system? The decimal basically needs log10(n) symbols to represent number n, the same number in binary format will need log2(n). My intuition would be that the roman system is less efficient (in terms of symbols needed to represent a value) than the decimal system, but better than the binary representation.

So I wrote a small program to compute how many chars would be needed to represent a given number.

The graph shows the various number system, each point is the mean number of characters needed to represent the values in the range [2i … 2i+1[. Obviously the binary representation is exactly linear. The decimal notation is also linear, with steps each time a power of 10 is crossed. The roman notation is also linear up to 218 where the curves goes up: the reason is simple, there is no symbol for values above 10000 (ↈ). Up until that point roman numeral seem to be as efficient as a ternary notation.

To generate those numbers, I created a small python script to generate roman numerals. For the sake of consistency, I chose to only generate characters from the unicode roman numeral range (x2160 to x2188). One interesting thing in this range is that it contains hybrid characters, like Ⅷ (x2167) which represent the roman numeral for eight in one single character. Using that range we get slightly different curve, the asymptotic behaviour is not changed, but numbers are generally shorter by two characters and for small numbers, the system is now as efficient as arabic numerals.

Here is the python that generates the shorter roman numerals using the hybrid characters.

NUMERALS = (
  (100000, u'ↈ'),
  (90000, u'ↂↈ'),
  (50000, u'ↇ'),
  (40000, u'ↂↇ'),
  (10000, u'ↂ'),
  (9000, u'Ⅿↂ'),
  (5000, u'ↁ'),
  (4000, u'Ⅿↁ'),
  (1000, u'Ⅿ'),
  (900, u'ⅭⅯ'),
  (500, u'Ⅾ'),
  (400, u'ⅭⅮ'),
  (100, u'Ⅽ'),
  (90, u'ⅩⅭ'),
  (50, u'Ⅼ'),
  (40, u'ⅩⅬ'),
  (20, u'ⅩⅩ'),  # to avoid 24 = 'ⅫⅫ'
  (12, u'Ⅻ'),
  (11, u'Ⅺ'),
  (10, u'Ⅹ'),
  (9, u'Ⅸ'),
  (8, u'Ⅷ'),
  (7, u'Ⅶ'),
  (6, u'Ⅵ'),
  (5, u'Ⅴ'),
  (4, u'Ⅳ'),
  (3, u'Ⅲ'),
  (2, u'Ⅱ'),
  (1, u'Ⅰ'),
)


def _roman(n):
  for value, text in NUMERALS:
    times, n = divmod(n, value)
    yield text * times
    if not n:
      return

def roman(n):
  return u''.join(_roman(n))

flattr this!

Python woes – range

Python Logo

One of the most annoying claims about python is that it is a high-level language. There is no clear definition of what a high-level language is, but the general idea is that the programmer does not need to think about the low-level details of the code that is generated and can work with more abstract types and concepts. The idea being that the programmer writes his intent in a clear way, and the language does the right thing.

for v in range(large_value):
  doStuff()

A typical example of an operator that does not do the right thing is range. Range is very convenient and used a lot in the Python code I have read, it is the pythonic way of avoid index counters when looping over ranges. Range has strange semantics: the meaning of the first argument changes if there is a second one, so you cannot call it with keyword arguments, like range(start=0, stop=10). But the true problem of range is that should never use it, as it broken by design: range just builds a list with said range, potentially using a huge amount of memory. The code in the snippet is correct, but will just kill your memory.

if v in xrange(a, large_value):
  doStuff()

People used to Python will tell you to just use xrange which does the right thing, kind of: xrange does not build a list, but creates iterators when needed. In Python 3, range range behaves like xrange. Consider the code on the right. Readable, semantically clear, will kill your machine: the search is done in O(n), where n is the size of the range. People who know Python will scoff and tell you one should not do this. Why not? If you think about it, a range is both a set (a collection of unique items), and a sequence (a collection of ordered items). What does python think about it?

x = range(10)
isinstance(x, collections.Set) → False
isinstance(x, collections.Sequence) → True
x[1:3] → TypeError: sequence index must be integer, not 'slice'

Walks like a set, but python does not acknowledge that it is a set. Why is xrange not a set? If it is a sequence, why can’t I read a slice of it? How hard would it be to write a proper range class for Python?

class IntRange(collections.Set, collections.Sequence):

  __slots__ = ('start', 'stop', 'step')

  def __init__(self, *args):
    if len(args) > 3 or len(args) < 1:
      raise ValueException()
    if len(args) == 3:
      self.step = args[2]
    else:
      self.step = 1
    if len(args) > 1:
      self.start = args[0]
      self.stop = args[1]
    else:
      self.start = 0
      self.stop = args[0]

  def __iter__(self):
    return xrange(self.start, self.stop, self.step).__iter__()

  def __len__(self):
    return (self.stop - self.start) / self.step

  def __contains__(self, value):
    if type(value) != int:
      return False
    if (value - self.start) % self.step:
      return False
    if self.step > 0:
      return value >= self.start and value < self.stop
    else:
      return value <= self.start and value > self.stop

  def __getitem__(self, index):
    value = index * self.step + self.start
    if self.step > 0 and value >= self.stop:
      raise IndexError()
    if self.step < 0 and value <= self.stop:
      raise IndexError()
    return value

  def __getslice__(self, start, end):
    return IntRange(self[start], self[end], self.step)

Voilà: a class that is functionally equivalent to xrange, but can find if it contains a value in O(1), and supports slicing. What would be cool to add is all the set operations defined in the frozenset class: intersection, union. What is a mystery to me is why xrange is so crippled by design. Must be a pythonic thing.

flattr this!

Python woes – Libraries

Python Logo

One might argue if issues in the libraries associated with one programming language are part of the language. I would argue it is, simply for the fact that in practice nobody uses a language without libraries (except for writing one-liners to show how cool a language is). One of my annoyances with Python is that although the language has a rich set of libraries doing various stuff, they are often inconsistent and often feel they have been written for an old version of the language and do not use any newer features or other recent libraries.

One of the very elegant features of Python are generators, this avoids horrible hacks like callbacks, yet, as of Python 2.6, the way to go over a file-system is using a method that takes a callback as an argument: os.path.walk. Of course it would be possible to write an adapter that uses walk to implement a generator, but that should be there by default. Another nice addition of Python 2.6 is collections.nametuples which lets one define light-weight classes that behave like tuples, but whose fields can be accessed by name, this is a nice way maintain backward compatibility while moving to a more readable model. Some python classes implement their own ad-hoc named tuple classes, for instance time.struct_time or urlparse.ParseResult, some functions still return un-named tuples, like socket.gethostbyname_ex or os.popen3. Having code that manipulates tuple fields just using their position is very unreadable and error-prone.

The classical example of the baroque structure of Python’s libraries is those related to time. The package to use when manipulating dates is datetime. You typically create instances of datetime.datetime by either using the static method now() or fromtimestamp(). Now both those method have a UTC variant, which builds an instance in the UTC time-zone. First problem: the instance does not store in itself in which time-zone it was created – not even a boolean that tells if the data is UTC or local. Basically if you want time-zone support, you need a package that is not part of the default installation. The other problem with datatime is that the methods provided by this class are not symmetric, in Python 2.6, there is an fromtimestamp() method, but no totimestamp() method, so the way to get this is time.mktime(d.timetuple()), which is not exactly readable. Similarly, the datetime class has a isoformat() method to display the date in ISO 8601 format, but there is no method to parse dates in that format.

Python 2.6 Serialisation
Format Serialise De-serialise
Pickle dump load
Json dump load
Plist writePlist readPlist

Another example of inconsistency are the serialisation libraries. Python 2.6 basically supports three serialisation formats out of the box: pickle, json and plist. The last one was added with Python 2.6. Here are the method to manipulate those types, can you spot the problem?

os.system
os.spawn*
os.popen*
popen2.*
commands.*
subprocess.*

Probably the most inconsistent set of libraries of python are those to execute sub-processes, there are four families of them. The subprocess pretends its aim is to replace the other libraries, but none of them admits in its inline documentation that it might be deprecated. Same thing goes for command-line argument parsing, there is getopt, optparse, argparse. Again, the online documentation hints that the first two are deprecated and that one should be using the last one, but neither inline documentation mentions it.

Unsurprisingly there are two libraries to open urls: urllib and urllib2, you might think that urllib2 would be the most advanced one, but the one that supports RFC 2397 data protocol urls, is, of course, urllib.

flattr this!

Python Woes – Duck typing

I keep seeing articles on the web that language X is ugly, and that Python is beautiful. While I can’t argue that some languages like PHP or Javascript are pretty much insane in their syntax, I’m somehow reluctant to say that Python is beautiful, or elegant. It is better than Bash or Perl, but that’s how far I will go.

a = 4.0
print a.is_integer()
True
'ha' * a
TypeError: can't multiply sequence by non-int of type 'float'
unichr(a)
TypeError: integer argument expected, got float

One thing that really annoys me in Python is duck typing. Not the idea, mind you, but the way it is implemented by the library: the fact that something walk and quacks like a duck does not mean that you can use it instead of a duck. Exhibit one: I have a variable that claims it is an integer, but I cannot use it as an integer.

The core problem is that the system used in Python for numbers is a total mess. See the is_integer() method I used above? it is only implemented by the type float, so the only way for a callee to check if some number is actually an integer is to call a method that is not defined on integers. Even for numerical types where the return value of is_integer() could actually change, like the new Fraction type introduced in Python 2.6 does not define it. This was supposed to be usable as drop-in replacement for floats.

def foo(x):
  return x % 10 * 4 + x % 15 * 2

The other problem is that Python overload operators in smart ways, this, coupled with duck-typing results in completely non-intuitive behaviour. The function on the side can return the string 'aaaaff'. Still one, would hope that overloading and duck-typing would ensure that the caller would never need to worry about calling the right function because of the type of his data. Wrong. How many Python programmers know that they should use math.fsum instead of sum when adding up float numbers?

sum(['h', 'e', 'l', 'o'], '')
TypeError: sum() can't sum strings [use ''.join(seq) instead]
a = [[1],[2],[3]]
sum(a, [])
[1, 2, 3]

Someone might argue that doing type-checks and dispatch control to the optimal back-end would be prohibitively expensive, so Python cannot do that. But it does, but only to be pedantic about it.

flattr this!

Blog theme in Commodore 64 palette

Commodore 64 Palette

Blog theme in Commodore 64 palette

Pixel art and 8 bit graphics are all the rage those days, so while the I clearly remember the block graphics, the colours in my memory were less saturated. The fact that I was using a crappy monitor is probably one explanation. After look­ing around on the web, I found a very informative page by Philip TimmermannPepto that explains how the Vic II chip was produ­cing colours and his resulting palette is quite dark, and seem somehow to match my memory better. As I wanted to play around with that palette, I wrote a quick python script that reads a CSV file and outputs a Photoshop Palette. I found the explanations on the ACO file format on this page.

Vic II colours
black white
red cyan
purple green
blue yellow
orange brown
light red dark grey
grey light green
light blue light grey

Interestingly, the resulting colour palette seems quite close to the hues of the theme I’m currently using on this blog (a modified version of japan-style), is there some influence at play? You can see a fragment of the banner image rendered using the Vic II’s palette in the top right part of the page. The size of the image corresponds to the full screen on a Commodore 64 in multi-colour mode: 200 × 160 pixels. The actual palette is in the table on the right (the font comes from ).

flattr this!