I am developing a heuristic for automatic language detection and would like to find

Question

0

Asked: May 30, 20262026-05-30T01:36:10+00:00 2026-05-30T01:36:10+00:00

I am developing a heuristic for automatic language detection and would like to find

0

I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like “Ðàäèî Êóëüòóðà” — all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.

I browsed through UnicodeCategory enum but didn’t find anything that could help me here.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T01:36:11+00:00

One possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. Then check if you have a letter followed by accents.

Adapting from How do I remove diacritics (accents) from a string in .NET?, you can normalize with Normalize(NormalizationForm.FormD) and check for the diacritics with UnicodeCategory.NonSpacingMark.

bool IsLetterWithDiacritics(char c)
{
    var s = c.ToString().Normalize(NormalizationForm.FormD);
    return (s.Length > 1)  &&
           char.IsLetter(s[0]) &&
           s.Skip(1).All(c2 => CharUnicodeInfo.GetUnicodeCategory(c2) == UnicodeCategory.NonSpacingMark);
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am developing a heuristic for automatic language detection and would like to find

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply