Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9013237
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T03:12:40+00:00 2026-06-16T03:12:40+00:00

I recently started playing around with Apache Solr and currently trying to figure out

  • 0

I recently started playing around with Apache Solr and currently trying to figure out the best way to benchmark the indexing of a corpus of XML documents. I am basically interested in the throughput (documents indexed/second) and index size on disk.

I am doing all this on Ubuntu.

Benchmarking Technique

* Run the following 5 times& get average total time taken *

  • Index documents [curl http://localhost:8983/solr/core/dataimport?command=full-import]
    • Get ‘Time taken’ name attribute from XML response when status is ‘idle’ [curl http://localhost:8983/solr/core/dataimport%5D
    • Get size of ‘data/index’ directory
  • Delete Index [curl http://localhost:8983/solr/core/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8']
  • Commit [curl http://localhost:8983/solr/w5/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8']
  • Re-index documents

Questions

  1. I intend to calculate my throughput by dividing the number of documents indexed by average total time taken; is this fine?
  2. Are there tools (like SolrMeter for query benchmarking) or standard scripts already available that I could use to achive my objectives? I do not want to re-invent the wheel…
  3. Is my approach fine?
  4. Is there an easier way of getting the index size as opposed to performing a ‘du’ on the data/index/ directory?
  5. Where can I find information on how to interpret XML response attributes (see sample output below). For instance, I would want to know the difference between the QTime and Time taken values.

* XML Response Used to Get Throughput *

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
    <int name="QTime">0</int>
  </lst>
  <lst name="initArgs">
    <lst name="defaults">
      <str name="config">w5-data-config.xml</str>
    </lst>
  </lst>
  <str name="status">idle</str>
  <str name="importResponse"/>
  <lst name="statusMessages">
    <str name="Total Requests made to DataSource">0</str>
    <str name="Total Rows Fetched">3200</str>
    <str name="Total Documents Skipped">0</str>
    <str name="Full Dump Started">2012-12-11 14:06:19</str>
    <str name="">Indexing completed. Added/Updated: 1600 documents. Deleted 0 documents.</str>
    <str name="Total Documents Processed">1600</str>
    <str name="Time taken">0:0:10.233</str>
  </lst>
  <str name="WARNING">This response format is experimental.  It is likely to change in the future.</str>
</response>
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T03:12:42+00:00Added an answer on June 16, 2026 at 3:12 am

    To question 1:

    I would suggest you should try to index more than 1 XML (with different dataset) file and compare the given results. Thats the way you will know if it´s ok to simply divide the taken time with your number of documents.

    To question 2:

    I didn´t find any of these tools, I did it by my own by developing a short Java application

    To question 3:

    Which approach you mean? I would link to my answer to question 1…

    To question 4:

    The size of the index folder gives you the correct size of the whole index, why don´t you want to use it?

    To question 5:

    The results you get in the posted XML is transfered through a XSL file. You can find it in the /bin/solr/conf/xslt folder. You can look up what the termes exactly means AND you can write your own XSL to display the results and informations.
    Note: If you create a new XSL file, you have to change the settings in your solrconfig.xml. If you don´t want to make any changes, edit the existing file.

    edit: I think the difference is, that the Qtime is the rounded value of the taken time value. There are only even numbers in Qtime.

    Best regards

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have started playing around with Apache camel recently. So being the experimental type
I recently started learning backbone and was wondering what would be the best way
I recently started playing around with writing Perl (v5.8.8) extensions using XS. One of
I've recently started playing around with ASP.NET and I was just about to connect
I've recently started playing around with Specflow, and generating test cases based on the
I've recently started playing around with ncurses. As far as I understand, C doesn't
I have recently started playing around with iOS development and have got most of
I've recently started playing around with the Microsoft Visual Studio Team System Profiler. All
I recently just started learning .NET, and more specifically I've been playing around with
I've recently started playing around with the ASP.NET MVC NerdDinner sample, and as part

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.