Suppose I have the following text in a text file
First Text
“Some Text”
“124arandom txt that should not be parsed!@
“124 Some Text”
“어떤 글”
this text a”s well should not be parsed
I would like to retrieve Some Text, 124 Some Text and 어떤 글 as matched strings. The text is read line by line. Catch is, it has to match foreign languages as well if it is inside quotes.
Update:
I found out something weird. I was trying some random stuff and found out that:
string s = "어떤 글"
Regex regex = new Regex("[^\"]*");
MatchCollection matches = regex.Matches(s);
matches have a count = 10 and have generated some empty items inside (The parsed text is in index 2). This might’ve been why I kept getting empty string when I was just doing Regex.Replace. Why is this happening?
If you read the text line by line, then the regex
will find all quoted strings, unless those may contain escaped quotes like
"a 2\" by 4\" board".To match those correctly, you need
If you don’t want the quotes to become part of the match, use lookaround assertions:
These regexes, as C# regexes, can be created like this: