I have this mapping in my C# application
string [,] unicode2Ascii = {
{ "ஹ", "\x86" }
};
ஹ – is the unicode value for a tamil literal “ஹ”. This is the raw hex literal for the unicode value saved by MS Word as a byte sequence. I am trying to map these unicode value “strings” to a hex value under 255 (so as to accommodate non-unicode supported systems).
I trying to use string.replace like this:
S = S.replace(unicode2Ascii[0,0], unicode2Ascii[0,1]);
However the resultant ouput has a ? instead of the actual hex 0x86 stored. Any pointer on how I could set the encoding for the second element of that array to something like windows-1252?
Or is there a better way to do this conversion?
thanks in advance
Not sure if this helps, but the Tamil codepage “57004 – ISCII Tamil” is supported by Windows.
It does not give the same translation for the example character above though. For ‘ஹ’ it gives 216. Perhaps a different codepage needs to be used?
Update
If you wish to take a unicode file as input, transliterate characters to get a single byte representation, the following should do the trick. The resulting array should have your single byte representation if your dictionary encodes each character: