How can I compute the intersection between two text files in terms of raw text? It doesn’t matter whether the solution uses a shell command or is expressed in Python, Elisp, or other common scripting languages.
I know comm and grep -Fxv -f file1 file2. Both assume that I am interested in the intersection of lines, whereas I am interested in the intersection of characters (with a minimum on the number of characters necessary to count as a match).
Bonus points for efficiency.
Example
If file 1 contains
foo bar baz-fee
and file 2 contains
fee foo bar-faa
then I would like to see
foo barfee
assuming a minimum match length of 3.
You’re looking for Python’s
difflibmodule (in the standard library), and in particulardifflib.SequenceMatcher.