// Calculating term frequency
System.out.println("Please enter the required word :");
Scanner scan = new Scanner(System.in);
String word = scan.nextLine();
String[] array = word.split(" ");
int filename = 11;
String[] fileName = new String[filename];
int a = 0;
int totalCount = 0;
int wordCount = 0;
for (a = 0; a < filename; a++) {
try {
System.out.println("The word inputted is " + word);
File file = new File(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
+ ".txt");
System.out.println(" _________________");
System.out.print("| File = abc" + a + ".txt | \t\t \n");
for (int i = 0; i < array.length; i++) {
totalCount = 0;
wordCount = 0;
Scanner s = new Scanner(file);
{
while (s.hasNext()) {
totalCount++;
if (s.next().equals(array[i]))
wordCount++;
}
System.out.print(array[i] + " ---> Word count = "
+ "\t\t " + "|" + wordCount + "|");
System.out.print(" Total count = " + "\t\t " + "|"
+ totalCount + "|");
System.out.printf(" Term Frequency = | %8.4f |",
(double) wordCount / totalCount);
System.out.println("\t ");
}
}
} catch (FileNotFoundException e) {
System.out.println("File is not found");
}
}
System.out.println("Please enter the required word :");
Scanner scan2 = new Scanner(System.in);
String word2 = scan2.nextLine();
String[] array2 = word2.split(" ");
int numofDoc;
for (int b = 0; b < array2.length; b++) {
numofDoc = 0;
for (int i = 0; i < filename; i++) {
try {
BufferedReader in = new BufferedReader(new FileReader(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc"
+ i + ".txt"));
int matchedWord = 0;
Scanner s2 = new Scanner(in);
{
while (s2.hasNext()) {
if (s2.next().equals(array2[b]))
matchedWord++;
}
}
if (matchedWord > 0)
numofDoc++;
} catch (IOException e) {
System.out.println("File not found.");
}
}
System.out.println(array2[b]
+ " --> This number of files that contain the term "
+ numofDoc);
double inverseTF = Math.log10((float) numDoc / numofDoc);
System.out.println(array2[b] + " --> IDF " + inverseTF );
double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);
}
}
Hi, this is my code for calculating term frequency and TF-IDF. The first code calculates the term frequency for each file of a given string. The second code is supposed to calculate TF-IDF for each file using the value from the above. But I only received one value. It’s supposed to provide TF-IDF value for each document.
Example output for term frequency :
The word input is ‘is’
| File = abc0.txt |
is —> Word count = |2| Total count = |150| Term Frequency = | 0.0133 |
The word inputted is ‘is’
| File = abc1.txt |
is —> Word count = |0| Total count = |9| Term Frequency = | 0.0000 |
The TF-IDF
is –> This number of files that contain the term 7
is –> IDF 0.1962946357308887
is –> TFIDF 0.0028607962606519654 <<< I suppose to get one value per file, means that i have 10 files, it suppose to give me 10 different values for each different file. But, it only prints one result only. Can someone point my mistake?
The part that prints the TDIDF needs to be moved inside the for loop that loops over all the files.
ie:
}
}