I have a full inverted index in form of nested python dictionary. Its structure

Question

0

Editorial Team

Asked: May 17, 20262026-05-17T17:28:03+00:00 2026-05-17T17:28:03+00:00

I have a full inverted index in form of nested python dictionary. Its structure

0

I have a full inverted index in form of nested python dictionary. Its structure is :

{word : { doc_name : [location_list] } }

For example let the dictionary be called index, then for a word ” spam “, entry would look like :

{ spam : { doc1.txt : [102,300,399], doc5.txt : [200,587] } }

so that, the documents containing any word can be given by index[word].keys() , and frequency in that document by len(index[word][document])

Now my question is, how do I implement a normal query search in this index. i.e. given a query containing lets say 4 words, find documents containing all four matches (ranked by total frequency of occurrence ), then docs containing 3 matches and so on ….

**

Added this code, using S. Lott’s answer.
This is the code I have written. Its working exactly as I want, ( just some formatting of output is needed ) but I know it could be improved.

**

from collections import defaultdict
from operator import itemgetter

# Take input

query = input(" Enter the query : ")

# Some preprocessing

query = query.lower()
query = query.strip()

# now real work

wordlist = query.split()
search_words = [ x for x in wordlist if x in index ]    # list of words that are present in index.

print "\nsearching for words ... : ", search_words, "\n"

doc_has_word = [ (index[word].keys(),word) for word in search_words ]
doc_words = defaultdict(list)
for d, w in doc_has_word:
    for p in d:
        doc_words[p].append(w)

# create a dictionary identifying matches for each document    

result_set = {}

for i in doc_words.keys():
    count = 0
    matches = len(doc_words[i])     # number of matches
    for w in doc_words[i]:
        count += len(index[w][i])   # count total occurances
    result_set[i] = (matches,count)

# Now print in sorted order

print "   Document \t\t Words matched \t\t Total Frequency "
print '-'*40
for doc, (matches, count)) in sorted(result_set.items(), key = itemgetter(1), reverse = True):
    print doc, "\t",doc_words[doc],"\t",count

Pls comment ….
Thanx.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T17:28:04+00:00

Editorial Team

2026-05-17T17:28:04+00:00Added an answer on May 17, 2026 at 5:28 pm

Here’s a start:

doc_has_word = [ (index[word].keys(),word) for word in wordlist ]

This will build an list of (word,document) pairs. You can’t easily make a dictionary out of that, since each document occurs many times.

But

from collections import defaultdict
doc_words = defaultdict(list)
for d, w in doc_has_word:
    doc_words[tuple(d.items())].append(w)

Might be helpful.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a full inverted index in form of nested python dictionary. Its structure

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply