Getting started with python. I am trying to implement positional index using nested dictionary. However I am not sure if thats the way to go. Index should contain term/term frequency/doc id/term position.
Example:
dict = {term: {termfreq: {docid: {[pos1,pos2,...]}}}}
My question is: am i on the right track here or is there a better solution to my problem. If nested dictionary is the way to go i have one additional question: how do I get single items out of the dictionary: for example term frequency for a term (without all the additional infromation about the term).
Help on this is greatly appreciated.
Each
termseems to have a term frequency, a doc id, and a list of positions. Is that right? If so, you could use a dict of dicts:Then, given a term, like ‘wassup’, you could look up the term frequency with
Think of a dict as being like a telephone book. It is great at looking up values (phone numbers) given keys (names). It is not so hot at looking up keys given values. Use a dict when you know you need to look things up in a one-way direction. You may need some other data structure (a database perhaps?) if your lookup patterns is more complex.
You might also want to check out the Natural Language Toolkit (nltk). It has a method for calculating
tf_idfbuilt in:yields