I’m sending signed XML via WebClient to a gateway. Now I have to ensure, that the node values only contain german letters. I have 2 Testwords. The first gets very well converted by using:
string foreignString = "Łůj꣥ü";
Encoding utf8 = Encoding.UTF8;
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
byte[] utfBytes = Encoding.Convert(iso, utf8, iso.GetBytes(foreignString));
string result = utf8.GetString(utfBytes);
But in the second string is a character which is also included in the UTF-8 Encoding. Its the
ç (Latin small letter c with cedilla)
After testing a little bit with other Encoding I always got the same result: the character was always there. What makes sense, because it is part of the UTF-8 table 🙂
So my question is: is there a way to mask out all the french, portuguese and spanish characters without dropping the german umlauts ?
Thanks in advance!
You can create your own
Encodingclass based on the ISO-8859-1 encoding with your additional special rules:This encoding is based on the fact that you want to use the ISO-8859-1 encoding with some additional restrictions where you want to map “non-german” characters to their ASCII equivalent. The built-in ISO-8859-1 encoding knows how to map
ŁtoLand because ISO-8859-1 is a single byte character set you can do additional mapping on the bytes because each byte corresponds to a character. This is done in theGetBytesmethod.You can “clean” a string using this code:
The resulting string is
LujeLAüc.Note that the implementation is quite simplistic and it uses a dictionary to perform the additional mapping step of the bytes. This might not be efficient but in that case you can consider alternatives like using a 256 byte mapping array. Also, you need to expand the
charMappingTableto contain all the additional mappings you want to perform.