I’ve a problem in making a PERL program for matching the words in two documents. Let’s say there are documents A and B.
So I want to delete the words in document A that’s not in the document B.
Example 1:
A: I eat pizza
B: She go to the market and eat pizza
result: eat pizza
example 2:
A: eat pizza
B: pizza eat
result:pizza
(the word order is relevant, so “eat” is deleted.)
I use Perl for the system and the sentences in each document isn’t in a big numbers so I think I won’t use SQL
And the program is a subproram for automatic essay grading for Indonesian Language (Bahasa)
Thanx,
Sorry if my question is a bit confusing. I’m really new to ‘this world’ 🙂
OK, I’m without access at the moment so this is not guaranteed to be 100% or even compile but should provide enough guidance:
Solution 1: (word order does not matter)
This should create a new file “A_new” that only contains A’s words that are in in B.
This has a slight bug – it will replace any multiple-whitespace in file A with a single space, so
will become
It can be fixed but would be really annoying to do so, so I didn’t bother unless you will absolutely require that whitespace be preserved 100% correctly
Solution 2: (word order matters BUT you can print words from file A out with no regards for preserving whitespace at all)
Solution 3 (why do we need Perl again? 🙂 )
You can do this trivially in shell without Perl (or via system() call or backticks in parent Perl script)
To call this from Perl:
But see my last comment why this may be considered “bad Perl”… at least if you do this in a loop with very many files being iterated and care about performance.