Im wondering if there is a way in .net to compare strings when they contain letters such as é.
Example: I’m searching a string that says José. I want to return true when I check to see if the string José contains “e” (without the accute)
Is there a way to do this without comparing all the variations of different characters manually?
any ideas?
You will first have to define diacritics somehow. Do not list all characters; instead, use Unicode categories. There are just two or three kinds of combining marks to think about.
For example, you might only want to detect combining marks that do not affect the width of the base character (“non-spacing marks”). Or you may be more liberal and include even marks that cannot stand alone, but still take up some space on the line when present; like vowel marks in Indic scripts. All three kinds of combining marks would be detected as follows:
Note the conversion to normal form D. This forces decomposition of all composed characters such as
étoeand'prior to looking at the string character by character.Now wait, you asked about the opposite, you wanted to detect whether the string contains a particular base character. That is even simpler.
In a similar vein, you might strip away particular categories of characters from each string separately, and only compare what remains.