For various reasons I won’t discuss here, I am writing a simple tokenizer in

Question

0

Asked: May 16, 20262026-05-16T07:04:50+00:00 2026-05-16T07:04:50+00:00

For various reasons I won’t discuss here, I am writing a simple tokenizer in

0

For various reasons I won’t discuss here, I am writing a simple tokenizer in C. Below is an example I hacked out which resizes the token buffer in predetermined increments as necessary when reading characters from the input stream. It will ultimately reach the size of the largest token which can obviously accommodate smaller tokens. Is this an acceptable approach? If not, what is the best way to determine the correct amount of memory to allocate for each token?

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define BUFF_CHUNK_SIZE 4

int main() {
    int c, i, n;
    char *buffer;

    i = 0;
    n = BUFF_CHUNK_SIZE;
    buffer = (char *)calloc(n, sizeof(char));

    while ((c = getchar()) != EOF) {
        if (isalnum(c)) {
            buffer[i] = (char)c;
            i++;
            if (i == n) {
                n += BUFF_CHUNK_SIZE;
                buffer = (char *)realloc(buffer, n * sizeof(char));
            }
        }
        else {
            if (i == 0) {
                continue;
            }
            i = 0;
            printf("%s\n", buffer);
            memset(buffer, 0, sizeof(char) * n);
        }
    }
    return 0;
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T07:04:51+00:00

This is pretty much the right approach – with two tweaks.

Firstly, instead of adding a constant BUFF_CHUNK_SIZE, it’s usually better to multiply it by a fixed amount. This means that your number of reallocs on a long string of length N becomes proportional to log N rather than N – meaning that the time spent in realloc() is proportional to N log N rather than N². It doesn’t really matter what the constant is – 1.5 might be a good choice (n += n / 2;).

Secondly, in a longer program you should really check for realloc() failing.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

For various reasons I won’t discuss here, I am writing a simple tokenizer in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply