I have made an app that paints FFT to the screen realtime (from mic). Time on x-axis, frequency on y-axis and the color of the pixel represents the amplitude (pretty much a vanilla FFT spectrogram).
My problem is that even though I can see a pattern from the music there is also a lot of noise. Googling it I see people applying a logarithmic calculation to the amplitude. Should I be doing this? And if so, what would the formula look like? (I’m using C#, but I can translate the math into code so any sample is ok.)
I can bypass this problem by applying a color scheme showing lower values as darker colors. I’m just not sure if the audio is correctly represented without a logarithmic calculation on it.
Representation of the amplitude on a logarithmic scale approximates the sensitivity of the human auditory system, and therefore gives you a better representation of what you hear, as compared to a non-logarithmic scale. Mathematically, all you have to do is:
Where
Ais the amplitude of the FFT data, andAlogis the output. the factor of20is just a convention and has no effect on the image, which you probably scale anyway to a color-scheme.EDIT
Explanation regarding the
20factor: The dB (decibel) unit is a logarithmic unit measuring ratios: it represents a scale on which the distance between 100 and 10, is the same as between 1000 and 100 (since they have the same ratio: 1000/100 = 100/10). If you measure it in dB you get:The factor of
10is becausedecimeanstenth, which means 1 Bel is 10 deciBels, (like 1 kilogram is 1000 grams)Since the human auditory system is also (approximately) measuring ratios, it makes sense to measure sound level on a logarithmic scale, i.e measure the ratio of sound level to some reference value. Since the level of a sound is associated with the power (in Watts) of the sound wave, you actually measure the ratio of powers P/Pref. Also, the power is proportional to the amplitude squared, so all in all you get:
by the log rules. That’s the origin of the
20factor – remember that in the computer the audio is represented by the instantaneous amplitude of the sound wave.