Need some help to find the number of matched tokens between two strings. I have a list of string stored in ArrayList (example given below):
Line 0 : WRB VBD NN VB IN CC RB VBP NNP
Line 1 : WDT NNS VBD DT NN NNP NNP
Line 2 : WRB MD PRP VB DT NN IN NNS POS JJ NNS
Line 3 : WDT NN VBZ DT NN IN DT JJ NN IN DT NNP
Line 4 : WP VBZ DT JJ NN IN NN
Here, you can see each string consists of a bunch of tokens separated by spaces. So, there’s three things I need to work with..
- Compare the first token (WRB) in Line 0 to the tokens in Line 1 to see if they match. Move on to the next tokens in Line 0 until a match is found. If there’s a match, mark the matched tokens in Line 1 so that it will not be matched again.
- Return the number of matched tokens between Line 0 and Line 1.
- Return the distance of the matched tokens. Example: token NN is found on position 3 on line 0 and position 5 on Line 1. Distance = |3-5| = 2
I’ve tried using split string and store it to String[] but String[] is fixed and doesn’t allow shrinking or adding of new elements. Tried Pattern Matcher but with disasterous results. Tried a few other methods but there’s some problems with my nested for loops..(will post part of my coding if it will help).
Any advice or pointers on how to solve this problem this would be very much appreciated. Thank you very much.
Have you tried using Scanner?
If not, totally do. It would look like this:
EDIT: Regarding your loops to select different lines in the Arraylist, what you need is to compare every array element to every other array element (which is probably the best thing to google if this explanation is lacking).
In Java that looks like this:
The reason the second loop starts from i+1 is to eliminate these unnecessary comparisons:
If you find this confusing, I would highly recommend working it out on paper by listing the values of i and j as you move through the loop.