I’m looking for nice pythonic way of filtering one list by another stop-list, but I want to match substrings from second list in first.
To be specific: I have list1 of URLs and list2 like:
['microsoft.com', 'ibm.com', 'cnn', '.ru'] etc
First list of URLs is huge (thousands of items), second list is smaller, like 500-1000. But simple match using “in” or sets is not enough, because second list items should be used as substring search.
All I could think is two “for” loops, but they don’t seem to by pythonic 🙂
PS Purpose is to remove matched items from first list.
You can build a single, disjunctive regular expression from the strings to be matched, then use the
searchmethod of the RE object to do the matching. Be sure tore.escapethe strings before pasting them in the RE.