I’ve written an application which allows audio fingerprinting using the method described here. It basically converts an mp3 to a wav and then creates a bunch of hashcodes in a database. I then create a recording using my iphone which has some noise and compare the hashcodes and get matches as documented in the link. Wow, its cool!!
Im now recording radio samples using a USB radio receiver. I get the sound data in a byte[] array and then do exactly the same thing where i store the hashcodes and then try to match it. This time it doesnt work.
My feeling is that the mp3 has been normalized (had compression applied to it) and this might be the difference. I couldnt think of any other differences as they are both (the mp3 and radio sample) converted to wav format (16bit)
I guess my question is twofold:
-
if i compress the radio sample do you think that itll work?
-
To do this i need to apply a compression function which means i need to make the soft sounds louder and the louder sounds softer.
Ive started writing a function which takes a byte array (of the wav data in 16 bit format) and wanted to cycle through it and adjust the sample values accordingly to do the compression but im struggling with this:
List<short> ints = new List<short>();
for (int j = 0; j < byteArray.Count; j+=2)
{
//so for 16 bits every 2 bytes in the array is a sample
short sample16 = 0;
byte[] sample = new byte[2];
sample[0] = byteArray[j];
sample[1] = byteArray[j+1];
sample16 = (short)(double)BitConverter.ToInt16(sample, 0);
//at this point change the sample according to the compression needed
ints.Add(sample16);
//back again to test it
byte[] buffer11 = BitConverter.GetBytes(sample16);
}
As sblom already stated in his comments, frequency domain hashing is not affected by dynamic range. According to your given information, I would think lacking of some frequencies between your inputs. Note that, MP3 has a psychoacoustic audio model which based on human perception. It precisely discards or masks some frequencies. So, your radio source may include or lack of some important frequencies to correctly recognize your inputs.