I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like “Ðàäèî Êóëüòóðà” — all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.
I browsed through UnicodeCategory enum but didn’t find anything that could help me here.
One possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. Then check if you have a letter followed by accents.
Adapting from How do I remove diacritics (accents) from a string in .NET?, you can normalize with
Normalize(NormalizationForm.FormD)and check for the diacritics withUnicodeCategory.NonSpacingMark.