I’m learning python and PyGTK now, and have created a simple Music Organizer. http://pastebin.com/m2b596852 But when it edits songs with the Norwegian letters æ, ø, and å it’s just changing them to a weird character.
So is there any good way of opening or encode the names into utf-8 characters?
Two relevant places from the above code:
Read info from a file:
def __parse(self, filename): 'parse ID3v1.0 tags from MP3 file' self.clear() self['artist'] = 'Unknown' self['title'] = 'Unknown' try: fsock = open(filename, 'rb', 0) try: fsock.seek(-128, 2) tagdata = fsock.read(128) finally: fsock.close() if tagdata[:3] == 'TAG': for tag, (start, end, parseFunc) in self.tagDataMap.items(): self[tag] = parseFunc(tagdata[start:end]) except IOError: pass
Print to sys.stdout info:
for info in files: try: os.rename(info['name'], os.path.join(self.dir, info['artist'])+' - '+info['title']+'.mp3') print 'From: '+ info['name'].replace(os.path.join(self.dir, ''), '') print 'To: '+ info['artist'] +' - '+info['title']+'.mp3' print self.progressbar.set_fraction(i/num) self.progressbar.set_text('File %d of %d' % (i, num)) i += 1 except IOError: print 'Rename fail'
You want to start by decoding the input FROM the charset it is in TO utf-8 (in Python, encode means ‘take it from unicode/utf-8 to some other charset’).
Some googling suggests the Norwegian charset is plain-ole ‘iso-8859-1’… I hope someone can correct me if I’m wrong on this detail. Regardless, whatever the name of the charset in the following example:
In a real-world app, I realize you can’t guarantee that the input is norwegian, or any other charset. In this case, you will probably want to proceed through a series of likely charsets to see which you can convert successfully. Both SO and Google have some suggestions on algorithms for doing this effectively in Python. It sounds scarier than it really is.