Getting to the Lyrics

🎶

As I’m not at native english speaker, the lyrics of music have always been a hit or miss affair. Looking-up those lyrics online has proven to be a frustrating affair, as most popular web-sites try various kinds of stupid tricks to prevent one from just copy-pasting said lyrics, basically screwing around with the selection in javascript and hiding the lyrics’s text within html escape sequences. There are probably numerous tools to solve this issue, but I felt like write a bit of code.

The code is very simple and rough, but gets the job done. On Mac OS X, assuming you have the url of the lyrics web-site in the clipboard, you can invoke it in the following way in the terminal. ./lyrics.py `pbpaste` | pbcopy

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re
import urllib2
import sys

tag_translate = {'br': u'\n', 'span': u'\n' }
escape_translate = {'quot': u'\'',}

funcs = (lambda x: unichr(int(x)), tag_translate.get, escape_translate.get,)

def decode(url):
  handle = urllib2.urlopen(url)
  try:
    content = handle.read()
    for match in re.findall(r'&#(\d*)|<([A-Za-z]+)[^>]*>|&(\w*);', content):
      for key, func in zip(match, funcs):
        if key:
          result = func(key.lower())
          if result:
            yield result
  finally:
    handle.close()


def main(argv):
  result = u''.join(decode(argv[0])).strip()
  sys.stdout.write(result.encode('utf-8'))
  sys.stdout.write('\n')


if __name__ == "__main__":
  main(sys.argv[1:])

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.