I’d like to write my own music streaming web application for my personal use but I’m racking my brain on how to manage it. Existing music and their location’s rarely change but are still capable of (fixing filename, ID3 tags, /The Chemical Brothers instead of /Chemical Brothers). How would the community manage all of these files? I can gather a lot of information through just an ID3 reader and my file system but it would also be nice to keep track of how often played and such. Would using iTunes’s .xml file be a good choice? Just keeping my music current in iTunes and basing my web applications data off of it? I was thinking of keeping track of all my music by md5’ing the file and using that as the unique identifier but if I change the ID3 tags will that change the md5 value?
I suppose my real question is, how can you keep track of large amounts of music? Keep the meta info in a database? Just how I would connect the file and db entry is my real question or just use a read when need filesystem setup.
I missed part 2 of your question (the md5 thing). I don’t think an MD5/SHA/… solution will work well because they don’t allow you to find doubles in your collection (like popular tracks that appear on many different samplers). And especially with big collections, that’s something you will want to do someday.
There’s a technique called acoustic fingerprinting that shows a lot of promise, have a look here for a quick intro. Even if there are minor differences in recording levels (like those popular “normalized” tracks), the acoustic fingerprint should remain the same – I say should, because none of the techniques I tested is really 100% errorfree. Another advantage of these acoustic fingerprints is that they can help you with tagging: a service like FreeDB will only work on complete CD’s, acoustic fingerprints can identify single tracks.