I have a mixed Hebrew/english string to parse.
The string is built like this:
[3 hebrew] [2 english 2] [1 hebrew],
So, it can be read as: 1 2 3, and it is stored as 3 2 1 (exact byte sequence in file, double-checked in hex editor, and anyway RTL is only the display attribute). .NET regex parser has RTL option, which (when given for plain LTR text) starts processing from right side of the string.
I am wondering, when this option should be applied to extract [3 hebrew] and [2 english] parts from the string,or to check if [1 hebrew] matches the end of the string? Are there any hidden specifics or there’s nothing to worry about (like when processing any LTR string with special unicode characters)?
Also, can anyone recommend me a good RTL+LTR text editor? (afraid that VS Express displays the text wrong sometimes, and if it can even start messing the saved strings – I would like to re-check the files without using hex editors anymore)
The
RightToLeftoption refers to the order through the character sequence that the regular expression takes, and should really be calledLastToFirstsince in the case of Hebrew and Arabic it is actually left-to-right, and with mixed RLT and LTR text such as you describe the expression “right to left” is even less appropriate.This has a minor effect on speed (will only matter if the searched text is massive) and on regular expressions that are done with a
startAtindex (searching those earlier in the string thanstartAtrather than later in the string).Examples; let’s hope the browers don’t mess this up too much: