I am trying to implement vectorization of a text file…I have created a dictionary (Unique words in all the documents) … Which is the best way to implement this in java?
For example –
My dictionary has the following words – {w1, w2, w3, w4}
And I have 2 documents each having subset of the words in the vocabulary. I need to write to a text file the matrix in the form —
1,3,4,0
0,0,2,1
Here each row represents a document and the values represent the occurrence of each word in the document.
Can you suggest me the most efficient way to implement this in Java?
Because of the homework tag I am giving you the steps and not any actual code(you can find how to do all of this with a quick google search if you dont know how)