I’m writing an Applescript playlist generator. Part of the process is to read the iTunes Library XML file to get a list of all of the genres in a user’s library. This is the python implementation, which works as I’d like:
#!/usr/bin/env python # script to get all of the genres from itunes import re,sys,sets ## Boosted from the internet to handle HTML entities in Genre names def unescape(text): def fixup(m): text = m.group(0) if text[:2] == '&#': # character reference try: if text[:3] == '&#x': return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is return re.sub('&#?\w+;', fixup, text) # probably faster to use a regex than to try to walk # the entire xml document and aggregate the genres try: xml_path = '/Users/%s/Music/iTunes/iTunes Music Library.xml' % sys.argv[1] except: print '\tUsage: python '+sys.argv[0]+' <your OSX username>' raise SystemExit pattern = '<key>Genre</key><string>([^<]+)</string>' try: xml = file(xml_path,'r').read() except: print '\tUnable to load your iTunes Library XML file' raise SystemExit matches = re.findall(pattern,xml) uniques = map(unescape,list(sets.Set(matches))) ## need to write these out somewhere so the applescript can read them sys.stdout.write('|'.join(uniques)) raise SystemExit
The problem is, I’d like the Applescript to be self-contained and not require that this additional file be present (I plan on making this available to other people). And, as far as I can tell, Applescript doesn’t offer any type of regular expression capabilities out of the box. I could loop over each track in the library to get all of the genres, but this is a prohibitively long process that I already do once when building the playlist. So, I’m looking for alternatives.
Since Applescript allows me to run a shell script and capture the results, I imagine that I can accomplish the same behavior using some type of shell command, be it grep, perl, or something else. My *nix command line skills are extremely rusty and I’m looking for some guidance.
So, in short, I’d like to find a way to translate the above python code into something I can call directly from the shell and get a similar result. Thanks!
Why are you using regex to parse XML? Why not use a proper XML library? Python has some great utilities like ElementTree that make walking the DOM a lot easier, and it yields nice, friendly objects rather than untyped strings.
Here are some ways of parsing XML using Applescript:
Applescript XML Parser (Available since Tiger apparently)
XML Tools you can also use with Applescript
Remember, just like Applescript can hook into iTunes, it can hook into other installed utilities like these.
Lastly, why not just write the whole thing in Python since it has way better development tools for debugging and runs a lot faster. If you’re running Leopard, you have Python 2.5.1 pre-installed.