I have a mixed Hebrew/english string to parse. The string is built like this:

Question

0

Asked: May 26, 20262026-05-26T11:35:57+00:00 2026-05-26T11:35:57+00:00

I have a mixed Hebrew/english string to parse. The string is built like this:

0

I have a mixed Hebrew/english string to parse.
The string is built like this:

[3 hebrew] [2 english 2] [1 hebrew],

So, it can be read as: 1 2 3, and it is stored as 3 2 1 (exact byte sequence in file, double-checked in hex editor, and anyway RTL is only the display attribute). .NET regex parser has RTL option, which (when given for plain LTR text) starts processing from right side of the string.

I am wondering, when this option should be applied to extract [3 hebrew] and [2 english] parts from the string,or to check if [1 hebrew] matches the end of the string? Are there any hidden specifics or there’s nothing to worry about (like when processing any LTR string with special unicode characters)?

Also, can anyone recommend me a good RTL+LTR text editor? (afraid that VS Express displays the text wrong sometimes, and if it can even start messing the saved strings – I would like to re-check the files without using hex editors anymore)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T11:35:58+00:00

The RightToLeft option refers to the order through the character sequence that the regular expression takes, and should really be called LastToFirst since in the case of Hebrew and Arabic it is actually left-to-right, and with mixed RLT and LTR text such as you describe the expression “right to left” is even less appropriate.

This has a minor effect on speed (will only matter if the searched text is massive) and on regular expressions that are done with a startAt index (searching those earlier in the string than startAt rather than later in the string).

Examples; let’s hope the browers don’t mess this up too much:

string saying = "למכות is in כתר"; //Just because it amuses me that this is a saying whatever way round the browser puts malkuth and kether.
string kether = "כתר";
Console.WriteLine(new Regex(kether, RegexOptions.RightToLeft).IsMatch(saying));//True
Console.WriteLine(new Regex(kether, RegexOptions.None).IsMatch(saying));//True, perhaps minutely faster but so little that noise would hide it.
Console.WriteLine(new Regex(kether, RegexOptions.RightToLeft).IsMatch(saying, 2));//False
Console.WriteLine(new Regex(kether, RegexOptions.None).IsMatch(saying, 2));//True
//And to show that the ordering is codepoint rather than physical display ordering:
Console.WriteLine(new Regex("" + kether[0] + ".*" + kether[2]).IsMatch(saying));//True
Console.WriteLine(new Regex("" + kether[2] + ".*" + kether[0]).IsMatch(saying));//False

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a mixed Hebrew/english string to parse. The string is built like this:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply