Consider I have following string with a tab in between left & right part in a text file:
The dreams of REM (Geo) sleep The sleep paralysis
I want to match the above string that match both left part & right part in each line of another following file:
The pons also contains the sleep paralysis center of the brain as well as generating the dreams of REM sleep.
If can not match with fill string, then try to match with substring.
I want to search with leftmost and rightmost pattern.
eg.(leftmost cases)
The dreams of REM sleep paralysis
The dreams of REM sleep The sleep
eg.(Right most cases):
REM sleep The sleep paralysis
The dreams of The sleep paralysis
Thanks a lot again for any kind of help.
(Ok, you clarified most of what you want. Let me restate, then clarify the points I listed below as remaining unclear… Also take the starter code I show you, adapt it, post us the result.)
You want to search, line-by-line, case-insensitive, for the longest contiguous matches to each of a pair of match-patterns. All the patterns seem to be disjoint (impossible to get a match on both patternX and patternY, since they use different phrases, e.g. can’t match both ‘frontal lobe’ and ‘prefrontal cortex’).
Your patterns are supplied as a sequence of pairs (‘dom’,’rang’), => let’s just refer to them by their subscript [0] and [1, you can use string.split(‘\t’) to get that.)
The important thing is a matching line must match both the dom and rang patterns (fully or partially).
Order is independent, so we can match rang then dom, or vice versa => use 2 separate regexes per line, and test d and r matched.
Patterns have optional parts, in parentheses => so just write/convert them to regex syntax using
(optionaltext)?syntax already, e.g.:re.compile('Frontallobes of (leftside)? the brain', re.IGNORECASE)The return value should be the string buffer with the longest substring match so far.
Now this is where several things remain to be clarified – please edit your question to explain the following:
Each of the above questions will affect the solution, so you need to answer them for us. There’s no point in writing pages of code to solve the most general case when you only needed something simple.
In general this is called ‘NLP’ (natural language processing). You might end up using an NLP library.
The general structure of the code so far is sounding like:
and running on the 7 lines of input you supplied currently gives: