Python woes – Libraries

Python Logo

One might argue if issues in the libraries associated with one programming language are part of the language. I would argue it is, simply for the fact that in practice nobody uses a language without libraries (except for writing one-liners to show how cool a language is). One of my annoyances with Python is that although the language has a rich set of libraries doing various stuff, they are often inconsistent and often feel they have been written for an old version of the language and do not use any newer features or other recent libraries.

One of the very elegant features of Python are generators, this avoids horrible hacks like callbacks, yet, as of Python 2.6, the way to go over a file-system is using a method that takes a callback as an argument: os.path.walk. Of course it would be possible to write an adapter that uses walk to implement a generator, but that should be there by default. Another nice addition of Python 2.6 is collections.nametuples which lets one define light-weight classes that behave like tuples, but whose fields can be accessed by name, this is a nice way maintain backward compatibility while moving to a more readable model. Some python classes implement their own ad-hoc named tuple classes, for instance time.struct_time or urlparse.ParseResult, some functions still return un-named tuples, like socket.gethostbyname_ex or os.popen3. Having code that manipulates tuple fields just using their position is very unreadable and error-prone.

The classical example of the baroque structure of Python’s libraries is those related to time. The package to use when manipulating dates is datetime. You typically create instances of datetime.datetime by either using the static method now() or fromtimestamp(). Now both those method have a UTC variant, which builds an instance in the UTC time-zone. First problem: the instance does not store in itself in which time-zone it was created – not even a boolean that tells if the data is UTC or local. Basically if you want time-zone support, you need a package that is not part of the default installation. The other problem with datatime is that the methods provided by this class are not symmetric, in Python 2.6, there is an fromtimestamp() method, but no totimestamp() method, so the way to get this is time.mktime(d.timetuple()), which is not exactly readable. Similarly, the datetime class has a isoformat() method to display the date in ISO 8601 format, but there is no method to parse dates in that format.

Python 2.6 Serialisation
Format Serialise De-serialise
Pickle dump load
Json dump load
Plist writePlist readPlist

Another example of inconsistency are the serialisation libraries. Python 2.6 basically supports three serialisation formats out of the box: pickle, json and plist. The last one was added with Python 2.6. Here are the method to manipulate those types, can you spot the problem?

os.system
os.spawn*
os.popen*
popen2.*
commands.*
subprocess.*

Probably the most inconsistent set of libraries of python are those to execute sub-processes, there are four families of them. The subprocess pretends its aim is to replace the other libraries, but none of them admits in its inline documentation that it might be deprecated. Same thing goes for command-line argument parsing, there is getopt, optparse, argparse. Again, the online documentation hints that the first two are deprecated and that one should be using the last one, but neither inline documentation mentions it.

Unsurprisingly there are two libraries to open urls: urllib and urllib2, you might think that urllib2 would be the most advanced one, but the one that supports RFC 2397 data protocol urls, is, of course, urllib.

2 thoughts on “Python woes – Libraries”

  1. Fwiw, there’s os.walk which is a generator (which is probably a good example too of the overall mess :)

Leave a Reply to PierreCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.