How efficient are roman numerals?

Roman numbers are one of those legacy system that hang around without ever really disappearing. Roman numbers are difficult to parse, and their length varies at lot. The mystery sequence in my previous post was just that: the length of roman numeral n for each value of n. The question I was wondering is: how inefficient is the roman numeral system? The decimal basically needs log10(n) symbols to represent number n, the same number in binary format will need log2(n). My intuition would be that the roman system is less efficient (in terms of symbols needed to represent a value) than the decimal system, but better than the binary representation.

So I wrote a small program to compute how many chars would be needed to represent a given number.

The graph shows the various number system, each point is the mean number of characters needed to represent the values in the range [2i … 2i+1[. Obviously the binary representation is exactly linear. The decimal notation is also linear, with steps each time a power of 10 is crossed. The roman notation is also linear up to 218 where the curves goes up: the reason is simple, there is no symbol for values above 10000 (ↈ). Up until that point roman numeral seem to be as efficient as a ternary notation.

To generate those numbers, I created a small python script to generate roman numerals. For the sake of consistency, I chose to only generate characters from the unicode roman numeral range (x2160 to x2188). One interesting thing in this range is that it contains hybrid characters, like Ⅷ (x2167) which represent the roman numeral for eight in one single character. Using that range we get slightly different curve, the asymptotic behaviour is not changed, but numbers are generally shorter by two characters and for small numbers, the system is now as efficient as arabic numerals.

Here is the python that generates the shorter roman numerals using the hybrid characters.

NUMERALS = (
  (100000, u'ↈ'),
  (90000, u'ↂↈ'),
  (50000, u'ↇ'),
  (40000, u'ↂↇ'),
  (10000, u'ↂ'),
  (9000, u'Ⅿↂ'),
  (5000, u'ↁ'),
  (4000, u'Ⅿↁ'),
  (1000, u'Ⅿ'),
  (900, u'ⅭⅯ'),
  (500, u'Ⅾ'),
  (400, u'ⅭⅮ'),
  (100, u'Ⅽ'),
  (90, u'ⅩⅭ'),
  (50, u'Ⅼ'),
  (40, u'ⅩⅬ'),
  (20, u'ⅩⅩ'),  # to avoid 24 = 'ⅫⅫ'
  (12, u'Ⅻ'),
  (11, u'Ⅺ'),
  (10, u'Ⅹ'),
  (9, u'Ⅸ'),
  (8, u'Ⅷ'),
  (7, u'Ⅶ'),
  (6, u'Ⅵ'),
  (5, u'Ⅴ'),
  (4, u'Ⅳ'),
  (3, u'Ⅲ'),
  (2, u'Ⅱ'),
  (1, u'Ⅰ'),
)


def _roman(n):
  for value, text in NUMERALS:
    times, n = divmod(n, value)
    yield text * times
    if not n:
      return

def roman(n):
  return u''.join(_roman(n))

flattr this!

Better resolutions

Diagram of Screen Sizes

Late Douglas Adams had the knack for describing things that would resonate with geeks in general and computer scientist in particular. His explanations of the Electric Monk in Dirk Gently’s Holistic Detective Agency regularly come to my mind when talking about dysfunctional computer systems. Another device devised by him is the total perspective vortex, a machine that would give during a short moment a sense of perspective to a human, rendering him mad instantly.

One thing that is difficult to put into perspective is the tremendous growth of computers systems. After my post about Commodore 64 graphics, I wanted to get a sense of scale of the various computer screens I used over my lifetime, so I did a little graphic. Here are the resolution of various computers I owned. The encompassing box is the 30 inch screen I use at work nowadays, which has a resolution of 2560 × 1600.

Resolutions
Device Year Resolution
Commodore 64 1982 320 × 200
Mac Classic 1990 512 × 342
PowerMac 7100 1994 1024 × 768
PowerMac G4 1999 1280 × 1024
MacBook Pro 2011 1280 × 800
Sony PS3 2006 1280 × 800
Sony Ericsson 802SE 2004 176 × 220
iPhone 3G 2008 320 × 480
iPhone 4 2010 640 × 960

flattr this!

Mystery sequence

I felt like trying something out, the result is the following number sequence.
Can you guess what it represents? I also created a file with the 2000 first values.

1 2 3 2 1 2 3 4 2 1 2 3 4 3 2 3 4 5 3
2 3 4 5 4 3 4 5 6 4 3 4 5 6 5 4 5 6 7 5
2 3 4 5 4 3 4 5 6 4 1 2 3 4 3 2 3 4 5 3
2 3 4 5 4 3 4 5 6 4 3 4 5 6 5 4 5 6 7 5
4 5 6 7 6 5 6 7 8 6 2 3 4 5 4 3 4 5 6 4
1 2 3 4 3 2 3 4 5 3 2 3 4 5 4 3 4 5 6 4
3 4 5 6 5 4 5 6 7 5 4 5 6 7 6 5 6 7 8 6
3 4 5 6 5 4 5 6 7 5 2 3 4 5 4 3 4 5 6 4
3 4 5 6 5 4 5 6 7 5 4 5 6 7 6 5 6 7 8 6
5 6 7 8 7 6 7 8 9 7 3 4 5 6 5 4 5 6 7 5
2 3 4 5 4 3 4 5 6 4 3 4 5 6 5 4 5 6 7 5
4 5 6 7 6 5 6 7 8 6 5 6 7 8 7 6 7 8 9 7
4 5 6 7 6 5 6 7 8 6 3 4 5 6 5 4 5 6 7 5
4 5 6 7 6 5 6 7 8 6 5 6 7 8 7 6 7 8 9 7
6 7 8 9 8 7 8 9 10 8 4 5 6 7 6 5 6 7 8 6
3 4 5 6 5 4 5 6 7 5 4 5 6 7 6 5 6 7 8 6
5 6 7 8 7 6 7 8 9 7 6 7 8 9 8 7 8 9 10 8
5 6 7 8 7 6 7 8 9 7 4 5 6 7 6 5 6 7 8 6
5 6 7 8 7 6 7 8 9 7 6 7 8 9 8 7 8 9 10 8
7 8 9 10 9 8 9 10 11 9 5 6 7 8 7 6 7 8 9 7

flattr this!

De Cap et de Croc 10 – de la lune à la terre

Couverture de l'acte Ⅹ de la Série “De Cap et de Croc”

Je n’achète plus beaucoup de bande dessinée, surtout en français, partiellement pour cause de temps, partiellement parce que je suis devenu plutôt exigeant. Le mode d’achat a aussi bien changé, j’ai tendance à lire sur internet et à acheter l’album après coup, ce qui prétérite les bande dessinées classiques. De Cap et de Croc est un peu l’exception : je l’avais commandé bien avant sa sortie, il faut dire qu’il s’agit de l’ultime tome d’une série en 10 volumes qui a commencé quand j’étais encore à l’université.

La série suit deux personnages, un loup et un renard dans une Europe ou se mêlent mousquetaires, pirates, magiciens et physiciens. Duels, malentendus, complots, intrigues, rimes, retournement et jeux de mots douteux forment la trame du scénario assez touffu, quoique classique. La série a l’avantage d’un style de dessin que j’aime beaucoup et qui sied bien au thème : assez technique et touffu. Le texte est aussi très élégant, avec de longues parties rimées, voire en alexandrins.

De Cap et de Croc Acte Ⅹ
De la Lune à La Terre


Delcourt
ISB : 978-2-7560-1996-3

Je rejoins l’avis d’Alias concernant le dernier tome, il est légèrement moins bon que les précédents, mais une fin était nécessaire pour fermer toutes les boucles, et il y a quand même de très bon bouts : j’ai ri à haute voix plusieurs fois en le lisant, notamment au vu de l’illustration de la page 34, c’est toujours bon de savoir que le dessinateur est lui aussi un enfant des années 80. Bref, indispensable pour ceux qui connaissent la série, pour les autres, je vous conseille de d’abord lire les premiers.

flattr this!

Python woes – range

Python Logo

One of the most annoying claims about python is that it is a high-level language. There is no clear definition of what a high-level language is, but the general idea is that the programmer does not need to think about the low-level details of the code that is generated and can work with more abstract types and concepts. The idea being that the programmer writes his intent in a clear way, and the language does the right thing.

for v in range(large_value):
  doStuff()

A typical example of an operator that does not do the right thing is range. Range is very convenient and used a lot in the Python code I have read, it is the pythonic way of avoid index counters when looping over ranges. Range has strange semantics: the meaning of the first argument changes if there is a second one, so you cannot call it with keyword arguments, like range(start=0, stop=10). But the true problem of range is that should never use it, as it broken by design: range just builds a list with said range, potentially using a huge amount of memory. The code in the snippet is correct, but will just kill your memory.

if v in xrange(a, large_value):
  doStuff()

People used to Python will tell you to just use xrange which does the right thing, kind of: xrange does not build a list, but creates iterators when needed. In Python 3, range range behaves like xrange. Consider the code on the right. Readable, semantically clear, will kill your machine: the search is done in O(n), where n is the size of the range. People who know Python will scoff and tell you one should not do this. Why not? If you think about it, a range is both a set (a collection of unique items), and a sequence (a collection of ordered items). What does python think about it?

x = range(10)
isinstance(x, collections.Set) → False
isinstance(x, collections.Sequence) → True
x[1:3] → TypeError: sequence index must be integer, not 'slice'

Walks like a set, but python does not acknowledge that it is a set. Why is xrange not a set? If it is a sequence, why can’t I read a slice of it? How hard would it be to write a proper range class for Python?

class IntRange(collections.Set, collections.Sequence):

  __slots__ = ('start', 'stop', 'step')

  def __init__(self, *args):
    if len(args) > 3 or len(args) < 1:
      raise ValueException()
    if len(args) == 3:
      self.step = args[2]
    else:
      self.step = 1
    if len(args) > 1:
      self.start = args[0]
      self.stop = args[1]
    else:
      self.start = 0
      self.stop = args[0]

  def __iter__(self):
    return xrange(self.start, self.stop, self.step).__iter__()

  def __len__(self):
    return (self.stop - self.start) / self.step

  def __contains__(self, value):
    if type(value) != int:
      return False
    if (value - self.start) % self.step:
      return False
    if self.step > 0:
      return value >= self.start and value < self.stop
    else:
      return value <= self.start and value > self.stop

  def __getitem__(self, index):
    value = index * self.step + self.start
    if self.step > 0 and value >= self.stop:
      raise IndexError()
    if self.step < 0 and value <= self.stop:
      raise IndexError()
    return value

  def __getslice__(self, start, end):
    return IntRange(self[start], self[end], self.step)

Voilà: a class that is functionally equivalent to xrange, but can find if it contains a value in O(1), and supports slicing. What would be cool to add is all the set operations defined in the frozenset class: intersection, union. What is a mystery to me is why xrange is so crippled by design. Must be a pythonic thing.

flattr this!

Une elfe qui reçoit un baiser d'un aventurier

Reflets d’Ancien Zines

Observatoire avec une lunette qui pointe sur la Lune.

J’ai récemment récupéré lors d’une visite chez Roboduck un lot d’anciennes illustrations du , principalement des dessins de Philippe “Φ” Schutz. Dans le lot, il y a des dessins pour un scénario Rêve de Dragon que je suis en train de remettre en page, l’Oasis des Deux Lunes. Croyant les illustrations originales perdues, j’avais demandé à Fiorella de faire de nouvelles dessins. Je me retrouve donc à présent avec deux lots d’images dans des styles très différents. Je pense garder celle de Fiorella pour les personnages et garder les paysages de Phi. J’espère pouvoir bientôt mettre une version mise en page du scénario en ligne.

flattr this!

CSS images, image-set and strange units.

📏

A candidate draft for CSS3 image values has been proposed, and it contains interesting ideas. Basically, an image can be something else than just an image file, instead it can be a set of images in fallback order and solid colour or a gradient. As I’m increasingly using SVG images for this blog, having the ability to specify fallback bitmaps is a nice feature. What I wish they would have added to the spec was the ability to have an image being a html sequence, that is, be able to specify image fallbacks for characters in exotic fonts (typically rare kanji, or emoji).

Another very interesting idea for image handling in CSS is the notion of image-set, which was proposed on the webkit mailing list, basically instead of specifying a single image, one would specify a set of images at different resolutions and the web-browser would select the one matching the current display. This is basically a problem with devices that have so called retina displays, like the iPhone 4. Basically, those devices do not display bitmaps at one pixel per CSS pixel, as they would display as post-stamps. In practice this means that the pixel is now an abstract unit, in CSS, a pixel is just a unit of length, defined as 1/96 of an inch, or 127/480 of a millimetre, and resolution can now be expressed in points per pixel. Yes, if you are geek, that does not really make sense, that would be like if atoms could be divided into parts…

I really hope those standards get traction, as the current situation is not satisfying: either I need to put images at twice the resolution on this blog, and then let the web browser scale them down to fit the layout, with the result that I’m wasting a lot of bandwidth, making the browser download four times too many pixels, or the picture look blurry on smartphones. It is also interesting to see that content negotiation between web browser and web server has failed: there is no simple way to tell wordpress to create a fallback bitmap when the browser cannot handle SVG.

flattr this!

Un Conte d’Hiver au Kunsthaus

Gysbrecht Lytens, scène hivernale avec gitans, première moitié du 17ème siècle Kunsthistorisches Museum Wien, Gemäldegalerie

Je ne suis pas exactement un fanatique d’art, mais j’ai profité de la visite d’une amie venue tout exprès pour cette exposition pour aller voir Un conte d’hiver – L’hiver dans l’art de la Renaissance à l’impres­sion­nisme, vu la météo lamentable, c’était un bon moyen de passer le week-end de Pâques.

J’aime bien l’art est présenté par thème plutôt que bêtement par époque, quoi de mieux que le même sujet pour voir en quoi les techniques et les styles ont changé. L’exposition n’est pas grande, mais contient un nombre sur­prenant de classiques, notamment un « Bonaparte franchissant le Grand Saint-Bernard » (sine canem). Il y a évidemment un grand nombre de tableau de canaux et de baies glacées par des maître flamands. Le golf sur glace semblait populaire à l’époque. L’exposition dure jusqu’à la fin du mois, donc si la météo ne s’arrange pas et que vous avez un peu de temps, je vous la recommande.

flattr this!

Python woes – Libraries

Python Logo

One might argue if issues in the libraries associated with one programming language are part of the language. I would argue it is, simply for the fact that in practice nobody uses a language without libraries (except for writing one-liners to show how cool a language is). One of my annoyances with Python is that although the language has a rich set of libraries doing various stuff, they are often inconsistent and often feel they have been written for an old version of the language and do not use any newer features or other recent libraries.

One of the very elegant features of Python are generators, this avoids horrible hacks like callbacks, yet, as of Python 2.6, the way to go over a file-system is using a method that takes a callback as an argument: os.path.walk. Of course it would be possible to write an adapter that uses walk to implement a generator, but that should be there by default. Another nice addition of Python 2.6 is collections.nametuples which lets one define light-weight classes that behave like tuples, but whose fields can be accessed by name, this is a nice way maintain backward compatibility while moving to a more readable model. Some python classes implement their own ad-hoc named tuple classes, for instance time.struct_time or urlparse.ParseResult, some functions still return un-named tuples, like socket.gethostbyname_ex or os.popen3. Having code that manipulates tuple fields just using their position is very unreadable and error-prone.

The classical example of the baroque structure of Python’s libraries is those related to time. The package to use when manipulating dates is datetime. You typically create instances of datetime.datetime by either using the static method now() or fromtimestamp(). Now both those method have a UTC variant, which builds an instance in the UTC time-zone. First problem: the instance does not store in itself in which time-zone it was created – not even a boolean that tells if the data is UTC or local. Basically if you want time-zone support, you need a package that is not part of the default installation. The other problem with datatime is that the methods provided by this class are not symmetric, in Python 2.6, there is an fromtimestamp() method, but no totimestamp() method, so the way to get this is time.mktime(d.timetuple()), which is not exactly readable. Similarly, the datetime class has a isoformat() method to display the date in ISO 8601 format, but there is no method to parse dates in that format.

Python 2.6 Serialisation
Format Serialise De-serialise
Pickle dump load
Json dump load
Plist writePlist readPlist

Another example of inconsistency are the serialisation libraries. Python 2.6 basically supports three serialisation formats out of the box: pickle, json and plist. The last one was added with Python 2.6. Here are the method to manipulate those types, can you spot the problem?

os.system
os.spawn*
os.popen*
popen2.*
commands.*
subprocess.*

Probably the most inconsistent set of libraries of python are those to execute sub-processes, there are four families of them. The subprocess pretends its aim is to replace the other libraries, but none of them admits in its inline documentation that it might be deprecated. Same thing goes for command-line argument parsing, there is getopt, optparse, argparse. Again, the online documentation hints that the first two are deprecated and that one should be using the last one, but neither inline documentation mentions it.

Unsurprisingly there are two libraries to open urls: urllib and urllib2, you might think that urllib2 would be the most advanced one, but the one that supports RFC 2397 data protocol urls, is, of course, urllib.

flattr this!

Python Woes – Duck typing

I keep seeing articles on the web that language X is ugly, and that Python is beautiful. While I can’t argue that some languages like PHP or Javascript are pretty much insane in their syntax, I’m somehow reluctant to say that Python is beautiful, or elegant. It is better than Bash or Perl, but that’s how far I will go.

a = 4.0
print a.is_integer()
True
'ha' * a
TypeError: can't multiply sequence by non-int of type 'float'
unichr(a)
TypeError: integer argument expected, got float

One thing that really annoys me in Python is duck typing. Not the idea, mind you, but the way it is implemented by the library: the fact that something walk and quacks like a duck does not mean that you can use it instead of a duck. Exhibit one: I have a variable that claims it is an integer, but I cannot use it as an integer.

The core problem is that the system used in Python for numbers is a total mess. See the is_integer() method I used above? it is only implemented by the type float, so the only way for a callee to check if some number is actually an integer is to call a method that is not defined on integers. Even for numerical types where the return value of is_integer() could actually change, like the new Fraction type introduced in Python 2.6 does not define it. This was supposed to be usable as drop-in replacement for floats.

def foo(x):
  return x % 10 * 4 + x % 15 * 2

The other problem is that Python overload operators in smart ways, this, coupled with duck-typing results in completely non-intuitive behaviour. The function on the side can return the string 'aaaaff'. Still one, would hope that overloading and duck-typing would ensure that the caller would never need to worry about calling the right function because of the type of his data. Wrong. How many Python programmers know that they should use math.fsum instead of sum when adding up float numbers?

sum(['h', 'e', 'l', 'o'], '')
TypeError: sum() can't sum strings [use ''.join(seq) instead]
a = [[1],[2],[3]]
sum(a, [])
[1, 2, 3]

Someone might argue that doing type-checks and dispatch control to the optimal back-end would be prohibitively expensive, so Python cannot do that. But it does, but only to be pedantic about it.

flattr this!