I wrote a function which extends isalnum to recognize UTF-8 coded umlaut. Is there

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T02:15:25+00:00 2026-05-26T02:15:25+00:00

I wrote a function which extends isalnum to recognize UTF-8 coded umlaut. Is there

0

I wrote a function which extends isalnum to recognize UTF-8 coded umlaut.

Is there maybe a more elegant way to solve this issue?

The code is as follows:

bool isalnumlaut(const char character) {
    int cr = (int) (unsigned char) character;
    if (isalnum(character)
            || cr == 195 // UTF-8
            || cr == 132 // Ä
            || cr == 164 // ä
            || cr == 150 // Ö
            || cr == 182 // ö
            || cr == 156 // Ü
            || cr == 188 // ü
            || cr == 159 // ß
    ) {
        return true;
    } else {
        return false;
    }
}

EDIT:

I tested my solution now several times, and it seems to do the job for my purpose though. Any strong objections?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T02:15:26+00:00

Your code doesn’t do what you’re claiming.

The utf-8 representation of Ä is two bytes – 0xC3,0x84. A lone byte with a value above 0x7F is meaningless in utf-8.

Some general suggestions:

Unicode is large. Consider using a library that has already handled the issues you’re seeing, such as ICU.
It doesn’t often make sense for a function to operate on a single code unit or code point. It makes much more sense to have functions that operate on either ranges of code points or single glyphs (see here for definitions of those terms).
Your concept of alpha-numeric is likely to be underspecified for a character set as large as the Universal Character Set; do you want to treat the characters in the Cyrillic alphabet as alphanumerics? Unicode’s concept of what is alphabetic may not match yours – especially if you haven’t considered it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wrote a function which extends isalnum to recognize UTF-8 coded umlaut. Is there

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply