int queryVector = 1;
double similarity = 0.0;
int wordPower;
String[][] arrays = new String[filename][2];
int row;
int col;
for (a = 0; a < filename; a++) {
int totalwordPower = 0;
int totalWords = 0;
try {
System.out
.println(" _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ");
System.out.println("\n");
System.out.println("The word inputted : " + word2);
File file = new File(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
+ ".txt");
System.out.println(" _________________");
System.out.print("| File = abc" + a + ".txt | \t\t \n");
for (int i = 0; i < array2.length; i++) {
totalCount = 0;
wordCount = 0;
Scanner s = new Scanner(file);
{
while (s.hasNext()) {
totalCount++;
if (s.next().equals(array2[i]))
wordCount++;
}
System.out.print(array2[i] + " --> Word count = "
+ "\t " + "|" + wordCount + "|");
System.out.print(" Total count = " + "\t " + "|"
+ totalCount + "|");
System.out.printf(" Term Frequency = | %8.4f |",
(double) wordCount / totalCount);
System.out.println("\t ");
double inverseTF = Math.log10((float) numDoc
/ (numofDoc[i]));
System.out.println(" --> IDF = " + inverseTF);
double TFIDF = (((double) wordCount / totalCount) * inverseTF);
System.out.println(" --> TF/IDF = " + TFIDF + "\n");
totalWords += wordCount;
wordPower = (int) Math.pow(wordCount, 2);
totalwordPower += wordPower;
System.out.println("Document Vector : " + wordPower);
similarity = (totalWords * queryVector)
/ ((Math.sqrt((totalwordPower)) * (Math
.sqrt(((queryVector * 3))))));
}
}
} catch (FileNotFoundException e) {
System.out.println("File is not found");
}
System.out.println("The total query frequency for this file is "
+ totalWords);
System.out.println("The total document vector : " + totalwordPower);
System.out.println("The similarity is " + similarity);
}
}
}
Hi i wanted to sort the SIMILARITY SCORE calculated from the code above. This is an example output of 2 text files. I have total of 10 text files together.
The word inputted : how are you
| File = abc0.txt |
how –> Word count = |0| Total count = |1289| Term Frequency = | 0.0000 |
–> IDF = 1.0413926851582251
–> TF/IDF = 0.0
Document Vector : 0
are –> Word count = |0| Total count = |1289| Term Frequency = | 0.0000 |
–> IDF = 0.43933269383026263
–> TF/IDF = 0.0
Document Vector : 0
you –> Word count = |0| Total count = |1289| Term Frequency = | 0.0000 |
–> IDF = 0.1962946357308887
–> TF/IDF = 0.0
Document Vector : 0
The total query frequency for this file is 0
The total document vector : 0
The SIMILARITY is NaN
The word inputted : how are you
| File = abc1.txt |
how –> Word count = |0| Total count = |426| Term Frequency = | 0.0000 |
–> IDF = 1.0413926851582251
–> TF/IDF = 0.0
Document Vector : 0
are –> Word count = |0| Total count = |426| Term Frequency = | 0.0000 |
–> IDF = 0.43933269383026263
–> TF/IDF = 0.0
Document Vector : 0
you –> Word count = |3| Total count = |426| Term Frequency = | 0.0070 |
–> IDF = 0.1962946357308887
–> TF/IDF = 0.0013823565896541458
Document Vector : 9
The total query frequency for this file is 3
The total document vector : 9
The SIMILARITY is 0.5773502691896257
Note : This is example run of two text files. I have a total of 10 text files.
How to sort the SIMILARITY score from the highest to the lowest? Any advices?
Add the SIMILARITY scores to a list and sort using library method. It sorts in ascending order, you can read it from the end.
Or you can declare a Comparator and use it like below.
HTH