Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 930157
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T20:17:26+00:00 2026-05-15T20:17:26+00:00

I am trying to index a table in a database using Lucene. I use

  • 0

I am trying to index a table in a database using Lucene. I use Lucene just for indexing, the Fields are not stored. The table mentioned above has five columns (userid (PK), description, report number, reporttype, report).

I intend to use a combination of userid, reportnumber and report type for getting data back from the database, if Lucene finds a hit.

One record in the table can span multiple rows for e.g.

JQ123, SOMEDESCRIPTION, 1, FIN, content of fin report
JQ123, AnotherDescription, 2, MATH, content of math report
JQ123, YetAnotherDesc, 3, MATH, content of another math report
JD456, MoreDesc, 1, STAT, content of stat report ..so on

Some of the report types e.g. (MATH) have highly structured contents (XML, stored as string in last column) and in the future I may want to flesh out some of the content
as a Field of the document.

My strategy so far has been to create a Lucene Document for every row and index it. My thinking behind it being that 1. It is easy and seems logical (to me)
2. if I end up extracting contents out of certain document types and making them in to Fields, all that would be needed is an if statement that checks for report type
and creates these new Fields. Here is the relevant code:

public void createDocument(){
Document luceneDocument=new Document();
luceneDocument.add(new Field("userid", userID, Field.Store.NO, Field.Index.NOT_ANALYZED));
luceneDocument.add(new Field("reportnumber", reportNum, Field.Store.NO, Field.Index.NOT_ANALYZED));
luceneDocument.add(new Field("reporttype", reportType, Field.Store.NO, Field.Index.NOT_ANALYZED));
luceneDocument.add(new Field("description", description, Field.Store.NO, Field.Index.ANALYZED));
luceneDocument.add(new Field("report", report, Field.Store.NO, Field.Index.ANALYZED));

if(reporttype.equalsIgnoreCase("MATH"){
luceneDocument.add(new Field("more fields", field content, Field.Store.NO, Field.Index.ANALYZED));
}
 indexwriter.add(luceneDocument)
 indexwriter.close
}           

1. Does having different Documents for the same record affect Lucene’s search efficiency in any fashion?
2. Would this approach have any significant disk space over heads when compared to having one Document per record in Lucene (I do not store any Fields)?

Thanks in advance for your response,

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T20:17:27+00:00Added an answer on May 15, 2026 at 8:17 pm

    First, note how the index is set up. Each term’s index looks like:

    [term][docid][docid]…

    where the [docid]’s are IDs of documents which contain that term. So to answer your questions:

    1. If e.g. MATH and STATS contained the same term, they would be listed twice here. And so the search would have to look at two documents, when it should in theory only need to look at one. But this is a very minimal penalty.
    2. I assume you have to store at least an ID for each document, so you will see a minor storage increase. It will be (length of id) * (number of documents per row). Again, this is trivial.

    A more important problem is the fact that queries can’t be normed appropriately. For example, a search finds row #1 that matches in MATH and STATS, and row #2 that matches only in MATH. You will need to manually rank row #1 higher, because Lucene won’t know that the two documents are actually the same row.

    In short: unless you have some absolutely massive index, I wouldn’t worry much about storage/performance. But I would worry about how you’re going to score that query.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.