How efficient are roman numerals?

Roman numbers are one of those legacy system that hang around without ever really disappearing. Roman numbers are difficult to parse, and their length varies at lot. The mystery sequence in my previous post was just that: the length of roman numeral n for each value of n. The question I was wondering is: how inefficient is the roman numeral system? The decimal basically needs log10(n) symbols to represent number n, the same number in binary format will need log2(n). My intuition would be that the roman system is less efficient (in terms of symbols needed to represent a value) than the decimal system, but better than the binary representation.

So I wrote a small program to compute how many chars would be needed to represent a given number.

The graph shows the various number system, each point is the mean number of characters needed to represent the values in the range [2i … 2i+1[. Obviously the binary representation is exactly linear. The decimal notation is also linear, with steps each time a power of 10 is crossed. The roman notation is also linear up to 218 where the curves goes up: the reason is simple, there is no symbol for values above 10000 (ↈ). Up until that point roman numeral seem to be as efficient as a ternary notation.

To generate those numbers, I created a small python script to generate roman numerals. For the sake of consistency, I chose to only generate characters from the unicode roman numeral range (x2160 to x2188). One interesting thing in this range is that it contains hybrid characters, like Ⅷ (x2167) which represent the roman numeral for eight in one single character. Using that range we get slightly different curve, the asymptotic behaviour is not changed, but numbers are generally shorter by two characters and for small numbers, the system is now as efficient as arabic numerals.

Here is the python script that generates the shorter roman numerals using the hybrid characters.

NUMERALS = (
  (100000, u'ↈ'),
  (90000, u'ↂↈ'),
  (50000, u'ↇ'),
  (40000, u'ↂↇ'),
  (10000, u'ↂ'),
  (9000, u'Ⅿↂ'),
  (5000, u'ↁ'),
  (4000, u'Ⅿↁ'),
  (1000, u'Ⅿ'),
  (900, u'ⅭⅯ'),
  (500, u'Ⅾ'),
  (400, u'ⅭⅮ'),
  (100, u'Ⅽ'),
  (90, u'ⅩⅭ'),
  (50, u'Ⅼ'),
  (40, u'ⅩⅬ'),
  (20, u'ⅩⅩ'),  # to avoid 24 = 'ⅫⅫ'
  (12, u'Ⅻ'),
  (11, u'Ⅺ'),
  (10, u'Ⅹ'),
  (9, u'Ⅸ'),
  (8, u'Ⅷ'),
  (7, u'Ⅶ'),
  (6, u'Ⅵ'),
  (5, u'Ⅴ'),
  (4, u'Ⅳ'),
  (3, u'Ⅲ'),
  (2, u'Ⅱ'),
  (1, u'Ⅰ'),
)


def _roman(n):
  for value, text in NUMERALS:
    times, n = divmod(n, value)
    yield text * times
    if not n:
      return

def roman(n):
  return u''.join(_roman(n))

3 thoughts on “How efficient are roman numerals?”

  1. There are some rules above 10000, like millions, but this is complicated.

    Anyway, efficiency in storing numbers is the least of the problems of this archaic system: it was only designed to store results of computing made with an abacus; it is complicated to write and decipher. Contrary to latin, there is no reason to use it anymore, except nostalgia, snobbery, will to keep some traces of history, and need of a second numeration system for the headers in big documents (but the Chinese system, would do to :-)
    (BTW, next exercise for you: add a line for the Chinese system)(and the Japanese, I suppose you’re familiar with it).

    Georges Ifrah described it well, I have a small summary there among other systems (like Chinese) :

    http://www.courtois.cc/blogeclectique/index.php?post/2007/05/11/333-l-histoire-universelle-des-chiffres-de-georges-ifrah-5

  2. Those archaic systems have some interesting error-handling properties. Remove one character from at any position of an arabic number, and the value will be off by one order of magnitude, if you take a roman number, you can get such errors only if you remove the first character, because of the way the system is built, you always know what magnitude a number is for.
    The chinese system (and the japanese) have the same error handling property, so it looks like this was a desirable property: include error correcting indications inside the number.

  3. You need floor(1 + log10(n)) symbols to represent a number ;)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.