For anyone not familiar with Verizon’s SongID program, it is a free application downloadable through Verizon’s VCast network. It listens to a song for 10 seconds at any point during the song and then sends this data to some all-knowing algorithmic beast that chews it up and sends you back all the ID3 tags (artist, album, song, etc…)
The first two parts and last part are straightforward, but what goes on during the processing after the recorded sound is sent?
I figure it must take the sound file (what format?), parse it (how? with what?) for some key identifiers (what are these? regular attributes of wave functions? phase/shift/amplitude/etc), and check it against a database.
Everything I find online about how this works is something generic like what I typed above.
From audiotag.info
This service is based on a
sophisticated audio recognition
algorithm combining advanced audio
fingerprinting technology and a large
songs’ database. When you upload an
audio file, it is being analyzed by an
audio engine. During the analysis its
audio “fingerprint” is extracted and
identified by comparing it to the
music database. At the completion of
this recognition process, information
about songs with their matching
probabilities are displayed on screen.
All of these services work by taking a “fingerprint” from the sampled audio data on the client side, sending it to a server and comparing it against a fingerprint database.
One of the developers of Shazam has written an extremely informative white paper on how the technology works. This should give you all of the information that you need.