UPDATE May this post be helpful for coders using RichTextBoxes. The Match is correct for a normal string, I did not see this AND I did not see that “ä” transforms to “\e4r” in the richTextBox.Rtf! So the Match.Value is correct – human error.
A RegEx finds the correct text but Match.Value is wrong because it replaces the german “ä” with “\’e4”!
Let example_text = “Primär-ABC” and lets use the following code
String example_text = "<em>Primär-ABC</em>";
Regex em = new Regex(@"<em>[^<]*</em>" );
Match emMatch = em.Match(example_text); //Works!
Match emMatch = em.Match(richtextBox.RTF); //Fails!
while (emMatch.Success)
{
string matchValue = emMatch.Value;
Foo(matchValue) ...
}
then the emMatch.Value returns “Prim\’e4r-ABC” instead of “Primär-ABC“.
The German ä transforms to \’e4!
Because I want to work with the exact string, i would need
emMatch.Value to be Primär-ABC – how do I achieve that?
In what context are you doing this?
This outputs
<em>Ich bin ein Bärliner</em>in my consoleThe problem probably isn’t that you’re getting the wrong value back, it’s that you’re getting a representation of the value that isn’t displayed correctly. This can depend on a lot of things. Try writing the value to a text file using UTF8 encoding and see if it still is incorrect.Edit: Right. The thing is that you are getting the text from a WinForms
RichTextBoxusing theRtfproperty. This will not return the text as is, but will return the RTF representation of the text. RTF is not plain text, it’s a markup format to display rich text. If you open an RTF document in e.g. Notepad you will see that it has a lot of weird codes in it – including\'e4for every ‘ä’ in your RTF document. If you would’ve used some markup (like bold text, color etc) in the RTF box, the.Rtfproperty would return that code as well, looking something like{\rtlch\fcs1 \af31507 \ltrch\fcs0 \cf6\insrsid15946317\charrsid15946317 test}So use the
.Textproperty instead. It will return the actual plain text.