I am working with a TreeMap of Strings TreeMap<String, String>, and using it to implement a Dictionay of words.
I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.
Each file should have a vector representing it with following properties:
- vector should have same size as dictionary
- for each word contained in the file the vector should have a 1 in the position corresponding to the word position in dictionary
- for each word not contained in the file the vector should have a -1 in the position corresponding to the word position in dictionary
So my idea is to use a Vector<Boolean> to implement these vectors. (This way of representing documents in a collection is called Boolean Model – http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)
The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:
String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...
1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?
2) Is there an iterator on TreeMap (it’s alphabetically ordered on keys) of which I can get position?
3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can’t do what I need) If yes, which?
Thanks in advance.
ADDED PART:
Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.
Any other ideas for my questions?
I’d like to thank all of you for the effort you put in answering my question, they all were very useful and taking the best from each of them made me come up to the solution I actually implemented in my project.
What I beleive to be best answers to my single questions are:
2) There is not an Iterator defined on TreeMaps as @Isoliveira sais:
and as I found in this SO answer How to iterate over a TreeMap?, the only way to iterate on elements in a
Mapis to usemap.entrySet()and use Iterators defined onSet(or some other class with Iterators).3) It’s possible to use a
TreeMapto implement Dictionary, but this will garantuee a complexity of O(logN) in finding index of a contained word (cost of a lookup in a Tree Data Structure).Using a
HashMapwith same procedure will instead have complexity O(1).1) There exists no such method. Only solution is to implement it entirely.
As @Paul stated
assumption of solution is that once that Dictionary is created it will not be changed afterwards: in this way position of a word will always be the same.
Giving this assumption I found a solution that allows to build Dictionary with complexity O(N) and after garantuees the possibility to get index of a word contained with constat time O(1) in lookup.
I defined Dictionary as a
HashMaplike this:Stringrepresenting the word contained in DictionaryObjectof a created classWordStructwhere
WordStructclass is defined like this:and allows me to keep memory of any kind of attribute I like to couple with the word entry of the Dictionary.
Now I fill dictionary iterating over all words contained in all files of my collection:
Once HashMap is filled in whatever order I use procedure indicated by @dasblinkenlight to order it once and for all with complexity O(N)
And from now on to have index position in alphatebetic order of word in dictionary only thing needed is to acces it’s variable
DictionaryPosition:since word is know you just need to access it and this has constant cost in a
HashMap.Thanks again and Iwish you all a Merry Christmas!!