I have found for several times the following guidelines for getting the power spectrum of an audio signal:
- collect N samples, where N is a power of 2
- apply a suitable window function to the samples, e.g. Hanning
- pass the windowed samples to an FFT routine – ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts
- calculate the squared magnitude of your FFT output bins (re * re + im * im)
- (optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB
- Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.
But… why do I need to apply a window function to the samples? What does that really means?
What about the power spectrum, is it the power of each frequency in the range of sample rate? (example: windows media player visualizer of sound?)
As @cyco130 says, your samples are already windowed by a rectangular function. Since a Fourier Transform assumes periodicity, any discontinuity between the last sample and the repeated first sample will cause artefacts in the spectrum (e.g. “smearing” of the peaks). This is known as spectral leakage. To reduce the effect of this we apply a tapered window function such as a Hann window which smooths out any such discontinuity and thereby reduces artefacts in the spectrum.