Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7864345
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T23:45:34+00:00 2026-06-02T23:45:34+00:00

I read the paper by Doug Cutting; Space optimizations for total ranking . Since

  • 0

I read the paper by Doug Cutting; “Space optimizations for total ranking“.

Since it was written a long time ago, I wonder what algorithms lucene uses (regarding postings list traversal and score calculation, ranking).

Particularly, the total ranking algorithm described there involves traversing down the entire postings list for each query term, so in case of very common query terms like “yellow dog”, either of the 2 terms may have a very very long postings list in case of web search. Are they all really traversed in the current Lucene/Solr? Or are there any heuristics to truncate the list employed?

In the case when only the top k results are returned, I can understand that distributing the postings list across multiple machines, and then combining the top-k from each would work, but if we are required to return “the 100th result page”, i.e. results ranked from 990–1000th, then each partition would still have to find out the top 1000, so
partitioning would not help much.

Overall, is there any up-to-date detailed documentation on the internal algorithms used by Lucene?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T23:45:35+00:00Added an answer on June 2, 2026 at 11:45 pm

    I am not aware of such documentation, but since Lucene is open-source, I encourage you go reading the source code. In particular, the current trunk version includes flexible indexing, meaning that the storage and posting list traversal has been decoupled from the rest of the code, making it possible to write custom codecs.

    You assumptions are correct regarding posting list traversal, by default (it depends on your Scorer implementation) Lucene traverses the entire posting list for every term present in the query and puts matching documents in a heap of size k to compute the top-k docs (see TopDocsCollector). So returning results from 990 to 1000 makes Lucene instantiate a heap of size 1000. And if you partition your index by document (another approach could be to split by term), every shard will need to send the top 1000 results to the server which is responsible for merging results (see Solr QueryComponent for example, which translates a query from N to P>N to several shard requests from 0 to P sreq.params.set(CommonParams.START, "0");). This is why Solr might be slower in distributed mode than in standalone mode in case of extreme paging.

    I don’t know how Google manages to score results efficiently, but Twitter published a paper on their retrieval engine Earlybird where they explain how they patched Lucene in order to do efficient reverse chronological order traversal of the posting lists, which allows them to return the most recent tweets matching a query without traversing the entire posting list for every term.

    Update:
    I found this presentation from Googler Jeff Dean, which explains how Google built its large scale information retrieval system. In particular, it talks about sharding strategies and posting list encoding.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

It would be fascinating to read the original paper where the Object Oriented Model
I just read a paper on preventing buffer overflow attacks and in that vein,
I'm a newbie in clang . I have read a paper about source to
I read Ken Thompson's classic paper Reflections on Trusting Trust in which he prompts
I have read McCarthy's 1960 paper on LISP and found no reference to anything
I've just read a paper about the Leader/Follower Pattern and if I understood correctly,
Recently, I read a white paper by an individual who refers to a pointer
From the paper of bigtable. bigtable I read this: Each METADATA row stores approximately
I have read his seminal paper, Self-stabilizing systems in spite of distributed control .
After watching the presentation Performance Anxiety of Joshua Bloch, I read the paper he

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.