I need to search for and mark patterns which are split somewhere on a line. Here is a shortened list of sample patterns which are placed in a separate file, e.g.:
CAT,TREE
LION,FOREST
OWL,WATERFALL
A match appears if the item from column 2 ever appears after and on the same line as the item from column 1. E.g.:
THEREISACATINTHETREE. (matches)
No match appears if the item from column 2 appears first on the line, e.g.:
THETREEHASACAT. (does not match)
Furthermore, no match appears if the item from column 1 and 2 touch, e.g.:
THECATTREEHASMANYBIRDS. (does not match)
Once any match is found, I need to mark it with \start{n} (appearing after the column 1 item) and \end{n} (appearing before the column 2 item), where n is a simple counter which increases anytime any match is found. E.g.:
THEREISACAT\start{1}INTHE\end{1}TREE.
Here is a more complex example:
THECATANDLIONLEFTTHEFORESTANDMETANDOWLINTREENEARTHEWATERFALL.
This becomes:
THECAT\start{1}ANDLION\start{2}LEFTTHE\end{2}FORESTANDMETANDOWL\start{3}INA\end{1}TREENEARTHE\end{3}WATERFALL.
Sometimes there are multiple matches in the same place:
THECATDOESNOTLIKETALLTREES,BUTINSTEADLIKESSHORTTREES.
This becomes:
THECAT\start{1}\start{2}DOESNOTLIKETALL\end{1}TREES,BUTINSTEADLIKESSHORT\end{2}TREES.
- There are no spaces in the file.
- Many non-Latin characters appear in the file.
- Pattern matches need only be found on the same line (e.g. “CAT” on line 1 does not ever match with a “TREE” found on line 2, as those are on different lines).
How can I find these matches and mark them in this way?
Check this out (Ruby):
Result
EDIT
I inserted some comments and clarified some of the variables.