I am integrating CMU’s PocketSphinx with Unity by compiling my own C++ project in

Question

0

Asked: June 13, 20262026-06-13T07:39:06+00:00 2026-06-13T07:39:06+00:00

I am integrating CMU’s PocketSphinx with Unity by compiling my own C++ project in

0

I am integrating CMU’s PocketSphinx with Unity by compiling my own C++ project in Visual Studio 2010 as a DLL, which I am calling from a C# script from within Unity Pro. I know that the dll works because I have made another project as an exe with the very same code, compiled it, and it works perfectly as a standalone program. I’m using the pocketsphinx_continuous project example, which gets microphone inputs and outputs text to the console. I have customized this code to be called from inside Unity and it is supposed to output back to my C# code as a string rather than to the console. I feel that I almost have this working, but the const char * is just not making it back as a string. I will end up getting access violation errors if I use this declaration:

private static extern string recognize_from_microphone();

so, I have tried to use this one:

private static extern IntPtr recognize_from_microphone();

and then, I use this line of code to try to print the output of that function:

print(“you just said ” + Marshal.PtrToStringAnsi(recognize_from_microphone()));

but, then I get only “you just said” in return. I can manage to get a memory address back if I do this: print(“you just said ” + recognize_from_microphone()); So, I know that something is getting returned.

Here is my C++ code (much of this was written originally in C as the example code from pocketsphinx):

char* MakeStringCopy (const char* str) 
{
  if (str == NULL) return NULL;
  char* res = (char*)malloc(strlen(str) + 1);
  strcpy(res, str);
  return res;
}


extern __declspec(dllexport) const char * recognize_from_microphone()
{
//this is a near complete duplication of the code from main()
char const *cfg;
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-hmm", MODELDIR "\\hmm\\en_US\\hub4wsj_sc_8k",
"-lm", MODELDIR "\\lm\\en\\turtle.DMP",
"-dict", MODELDIR "\\lm\\en\\turtle.dic",
NULL);

if (config == NULL)
{
   return "config is null";
}

ps = ps_init(config);
if (ps == NULL)
{
   return "ps is null";
}

ad_rec_t *ad;
int16 adbuf[4096];
int32 k, ts, rem;
char const *hyp;
char const *uttid;
cont_ad_t *cont;
char word[256];
char words[1024] = "";
//char temp[] = "hypothesis";
//hyp = temp;

if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"),
                      (int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
    E_FATAL("Failed to open audio device\n");

/* Initialize continuous listening module */
if ((cont = cont_ad_init(ad, ad_read)) == NULL)
    E_FATAL("Failed to initialize voice activity detection\n");
if (ad_start_rec(ad) < 0)
    E_FATAL("Failed to start recording\n");
if (cont_ad_calib(cont) < 0)
    E_FATAL("Failed to calibrate voice activity detection\n");

for (;;) {
    /* Indicate listening for next utterance */
    //printf("READY....\n");
    fflush(stdout);
    fflush(stderr);

    /* Wait data for next utterance */
    while ((k = cont_ad_read(cont, adbuf, 4096)) == 0)
        sleep_msec(100);

    if (k < 0)
        E_FATAL("Failed to read audio\n");

    /*
     * Non-zero amount of data received; start recognition of new utterance.
     * NULL argument to uttproc_begin_utt => automatic generation of utterance-id.
     */
    if (ps_start_utt(ps, NULL) < 0)
        E_FATAL("Failed to start utterance\n");

    ps_process_raw(ps, adbuf, k, FALSE, FALSE);
    //printf("Listening...\n");
    fflush(stdout);

    /* Note timestamp for this first block of data */
    ts = cont->read_ts;

    /* Decode utterance until end (marked by a "long" silence, >1sec) */
    for (;;) {

        /* Read non-silence audio data, if any, from continuous listening module */
        if ((k = cont_ad_read(cont, adbuf, 4096)) < 0)
            E_FATAL("Failed to read audio\n");
        if (k == 0) {
            /*
             * No speech data available; check current timestamp with most recent
             * speech to see if more than 1 sec elapsed.  If so, end of utterance.
             */
            if ((cont->read_ts - ts) > DEFAULT_SAMPLES_PER_SEC)
                break;
        }
        else {
            /* New speech data received; note current timestamp */
            ts = cont->read_ts;
        }

        /*
         * Decode whatever data was read above.
         */
        rem = ps_process_raw(ps, adbuf, k, FALSE, FALSE);

        /* If no work to be done, sleep a bit */
        if ((rem == 0) && (k == 0))
            sleep_msec(20);
    }

    /*
     * Utterance ended; flush any accumulated, unprocessed A/D data and stop
     * listening until current utterance completely decoded
     */
    ad_stop_rec(ad);
    while (ad_read(ad, adbuf, 4096) >= 0);
    cont_ad_reset(cont);
    fflush(stdout);
    /* Finish decoding, obtain and print result */
    ps_end_utt(ps);

    hyp = ps_get_hyp(ps, NULL, &uttid);
    fflush(stdout);

    /* Exit if the first word spoken was GOODBYE */
   //actually, for unity, exit if any word was spoken at all! this will avoid an infinite loop of doom!
    if (hyp) {
        /*sscanf(hyp, "%s", words);
        if (strcmp(word, "goodbye") == 0)*/
            break;
    }
   else
     return "nothing returned";
    /* Resume A/D recording for next utterance */
    if (ad_start_rec(ad) < 0)
        E_FATAL("Failed to start recording\n");
}
cont_ad_close(cont);
ad_close(ad);
ps_free(ps);
const char *temp = new char[1024];
temp = MakeStringCopy(hyp);
return temp;}

If change return temp; to return “some string here”; Then I see the text appear inside Unity. That’s not helpful, though, because I don’t need hardcoded text, I need the output of the speech recognition code, which ends up getting stored inside the hyp variable.

Can anyone help me figure out what I’m doing wrong? Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T07:39:08+00:00

Finally got this working! I ended up having to pass a stringbuilder object into the C++ function and get the string from that object in C# just like I found in this post:
http://www.pcreview.co.uk/forums/passing-and-retrieving-string-calling-c-function-c-t1367069.html

The code is slower than I’d like, but at least it works now. Here was my final code:

C#:

[DllImport ("pocketsphinx_unity",CallingConvention=CallingConvention.Cdecl,CharSet = CharSet.Ansi)]
private static extern void recognize_from_microphone(StringBuilder str);StringBuilder mytext= new StringBuilder(1000);
recognize_from_microphone(mytext);
print("you just said " + mytext.ToString());

C++:

extern __declspec(dllexport) void recognize_from_microphone(char * fromUnity){
static ps_decoder_t *ps;
static cmd_ln_t *config;

config = cmd_ln_init(NULL, ps_args(), TRUE,
"-hmm", MODELDIR "\\hmm\\en_US\\hub4wsj_sc_8k",
"-lm", MODELDIR "\\lm\\en\\turtle.DMP",
"-dict", MODELDIR "\\lm\\en\\turtle.dic",
NULL);

if (config == NULL)
{
    //return "config is null";
}

ps = ps_init(config);
if (ps == NULL)
{
    //return "ps is null";
}

ad_rec_t *ad;
int16 adbuf[4096];
int32 k, ts, rem;
char const *hyp;
char const *uttid;
cont_ad_t *cont;
//char word[256];
char * temp;

if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"),
                      (int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
    printf("Failed to open audio device\n");

/* Initialize continuous listening module */
if ((cont = cont_ad_init(ad, ad_read)) == NULL)
    printf("Failed to initialize voice activity detection\n");
if (ad_start_rec(ad) < 0)
    printf("Failed to start recording\n");
if (cont_ad_calib(cont) < 0)
    printf("Failed to calibrate voice activity detection\n");

for (;;) {
    /* Indicate listening for next utterance */
    //printf("READY....\n");
    fflush(stdout);
    fflush(stderr);

    /* Wait data for next utterance */
    while ((k = cont_ad_read(cont, adbuf, 4096)) == 0)
        sleep_msec(100);

    if (k < 0)
        printf("Failed to read audio\n");

    /*
     * Non-zero amount of data received; start recognition of new utterance.
     * NULL argument to uttproc_begin_utt => automatic generation of utterance-id.
     */
    if (ps_start_utt(ps, NULL) < 0)
        printf("Failed to start utterance\n");

    ps_process_raw(ps, adbuf, k, FALSE, FALSE);
    //printf("Listening...\n");
    fflush(stdout);

    /* Note timestamp for this first block of data */
    ts = cont->read_ts;

    /* Decode utterance until end (marked by a "long" silence, >1sec) */
    for (;;) {

        /* Read non-silence audio data, if any, from continuous listening module */
        if ((k = cont_ad_read(cont, adbuf, 4096)) < 0)
            printf("Failed to read audio 2nd\n");
        if (k == 0) {
            /*
             * No speech data available; check current timestamp with most recent
             * speech to see if more than 1 sec elapsed.  If so, end of utterance.
             */
            if ((cont->read_ts - ts) > DEFAULT_SAMPLES_PER_SEC)
                break;
        }
        else {
            /* New speech data received; note current timestamp */
            ts = cont->read_ts;
        }

        /*
         * Decode whatever data was read above.
         */
        rem = ps_process_raw(ps, adbuf, k, FALSE, FALSE);

        /* If no work to be done, sleep a bit */
        if ((rem == 0) && (k == 0))
            sleep_msec(20);
    }

    /*
     * Utterance ended; flush any accumulated, unprocessed A/D data and stop
     * listening until current utterance completely decoded
     */
    ad_stop_rec(ad);
    while (ad_read(ad, adbuf, 4096) >= 0);
    cont_ad_reset(cont);
    fflush(stdout);
    /* Finish decoding, obtain and print result */
    ps_end_utt(ps);

    hyp = ps_get_hyp(ps, NULL, &uttid);
    fflush(stdout);

    /* Exit if the first word spoken was GOODBYE */
    //actually, for unity, exit if any word was spoken at all! this will avoid an infinite loop of doom!
    if (hyp) {
        strcpy(fromUnity,hyp);
        break;               
    }
    else
        //return "nothing returned";
    /* Resume A/D recording for next utterance */
    if (ad_start_rec(ad) < 0)
        printf("Failed to start recording\n");
}

cont_ad_close(cont);
ad_close(ad);
ps_free(ps);
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am integrating CMU’s PocketSphinx with Unity by compiling my own C++ project in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply