I’ve been tasked with finding documents that contain certain words if other words exist in the same document. It was worded to me like this:
Contains word1 or word2 within the same document as word3 or word4
I’ve been messing around with if/then conditionals for regexp and I can’t quite figure it out. Here is what I have so far:
(?(word3|word4)(word1|word2)|())
This doesnt seem to work for me though. Even if the document only contains ‘word2’, it still matches.
Any suggestions?
You probably want to avoid regular expressions here. It’s quite awkward to write that using regular expressions alone, but it can be done either using a lookahead: (Rubular)
Or by listing all permutations (not too difficult here, but quickly gets out of hand for more complex examples): (Rubular)
If your text can contain new lines, add the “dot all” modifier to the regular expression so that the dot also matches the new line character. The specific syntax for this varies from language to language, but commonly it is a flag “s” written after the regular expression delimiter. But check the documentation for the specific language you are using.
Instead though, I’d suggest you split the document into a collection (e.g. list or set) of words and then search the collection using ordinary code.