I’m writing an iPhone app which should record the users voice, and feed the

Question

0

Asked: May 17, 20262026-05-17T17:41:33+00:00 2026-05-17T17:41:33+00:00

I’m writing an iPhone app which should record the users voice, and feed the

0

I’m writing an iPhone app which should record the users voice, and feed the audio data into a library for modifications such as changing tempo and pitch. I started off with the SpeakHere example code from Apple:

http://developer.apple.com/library/ios/#samplecode/SpeakHere/Introduction/Intro.html

That project lays the groundwork for recording the user’s voice and playing it back. It works well.

Now I’m diving into the code and I need to figure out how to feed the audio data into the SoundTouch library (http://www.surina.net/soundtouch/) to change the pitch. I became familiar with the Audio Queue framework while going through the code, and I found the place where I receive the audio data from the recording.

Essentially, you call AudioQueueNewInput to create a new input queue. You pass a callback function which is called every time a chunk of audio data is available. It is within this callback that I need to pass the chunks of data into SoundTouch.

I have it all setup, but the noise I play back from the SoundTouch library is very staticky (it barely resembles the original). If I don’t pass it through SoundTouch and play the original audio it works fine.

Basically, I’m missing something about what the actual data I’m getting represents. I was assuming that I am getting a stream of shorts which are samples, 1 sample for each channel. That’s how SoundTouch is expecting it, so it must not be right somehow.

Here is the code which sets up the audio queue so you can see how it is configured.

void AQRecorder::SetupAudioFormat(UInt32 inFormatID)
{
memset(&mRecordFormat, 0, sizeof(mRecordFormat));

UInt32 size = sizeof(mRecordFormat.mSampleRate);
XThrowIfError(AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareSampleRate,
                                          &size, 
                                          &mRecordFormat.mSampleRate), "couldn't get hardware sample rate");

size = sizeof(mRecordFormat.mChannelsPerFrame);
XThrowIfError(AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareInputNumberChannels, 
                                          &size, 
                                          &mRecordFormat.mChannelsPerFrame), "couldn't get input channel count");

mRecordFormat.mFormatID = inFormatID;
if (inFormatID == kAudioFormatLinearPCM)
{
    // if we want pcm, default to signed 16-bit little-endian
    mRecordFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
    mRecordFormat.mBitsPerChannel = 16;
    mRecordFormat.mBytesPerPacket = mRecordFormat.mBytesPerFrame = (mRecordFormat.mBitsPerChannel / 8) * mRecordFormat.mChannelsPerFrame;
    mRecordFormat.mFramesPerPacket = 1;
}
}

And here’s part of the code which actually sets it up:

    SetupAudioFormat(kAudioFormatLinearPCM);

    // create the queue
    XThrowIfError(AudioQueueNewInput(
                                  &mRecordFormat,
                                  MyInputBufferHandler,
                                  this /* userData */,
                                  NULL /* run loop */, NULL /* run loop mode */,
                                  0 /* flags */, &mQueue), "AudioQueueNewInput failed");

And finally, here is the callback which handles new audio data:

void AQRecorder::MyInputBufferHandler(void *inUserData,
                                  AudioQueueRef inAQ,
                                  AudioQueueBufferRef inBuffer,
                                  const AudioTimeStamp *inStartTime,
                                  UInt32 inNumPackets,
                                  const AudioStreamPacketDescription *inPacketDesc) {
AQRecorder *aqr = (AQRecorder *)inUserData;
try {
        if (inNumPackets > 0) {
            CAStreamBasicDescription queueFormat = aqr->DataFormat();
            SoundTouch *soundTouch = aqr->getSoundTouch();

            soundTouch->putSamples((const SAMPLETYPE *)inBuffer->mAudioData,
                                   inBuffer->mAudioDataByteSize / 2 / queueFormat.NumberChannels());

            SAMPLETYPE *samples = (SAMPLETYPE *)malloc(sizeof(SAMPLETYPE) * 10000 * queueFormat.NumberChannels());
            UInt32 numSamples;
            while((numSamples = soundTouch->receiveSamples((SAMPLETYPE *)samples, 10000))) {
                // write packets to file
                XThrowIfError(AudioFileWritePackets(aqr->mRecordFile,
                                                    FALSE,
                                                    numSamples * 2 * queueFormat.NumberChannels(),
                                                    NULL,
                                                    aqr->mRecordPacket,
                                                    &numSamples,
                                                    samples),
                              "AudioFileWritePackets failed");
                aqr->mRecordPacket += numSamples;
            }
            free(samples);
        }

        // if we're not stopping, re-enqueue the buffe so that it gets filled again
        if (aqr->IsRunning())
            XThrowIfError(AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL), "AudioQueueEnqueueBuffer failed");
} catch (CAXException e) {
    char buf[256];
    fprintf(stderr, "Error: %s (%s)\n", e.mOperation, e.FormatError(buf));
}
}

You can see that I’m passing the data in inBuffer->mAudioData to SoundTouch. In my callback, what exactly are the bytes representing, i.e. how do I extract samples from mAudioData?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T17:41:33+00:00

Editorial Team

2026-05-17T17:41:33+00:00Added an answer on May 17, 2026 at 5:41 pm

You have to check that the endianess, the signedness, etc. of what you are getting match what the library expects. Use mFormatFlags of AudioStreamBasicDescription to determine the source format. Then you might have to convert the samples (e.g. newSample = sample + 0x8000)

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing an iPhone app which should record the users voice, and feed the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply