Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5994117
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T23:46:22+00:00 2026-05-22T23:46:22+00:00

I am trying to import a large dataset (41million records) into a new Solr

  • 0

I am trying to import a large dataset (41million records) into a new Solr index. I have setup the core, it works, I inserted some test docs, they work. I have setup the data-config.xml as below and then I start the full-import. After about 12 hours! the import fails.

The document size can get quite large, could the error be because of a large document (or field) or due to the volume of data going into the DataImportHandler?

How can I get this frustrating import task working!?!

I have included the tomcat error log below.

Let me know if there is any info i have missed!

logs:

Jun 1, 2011 5:47:55 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity results with URL: jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor
Jun 1, 2011 5:47:56 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 1185
Jun 1, 2011 5:48:02 PM org.apache.solr.core.SolrCore execute
INFO: [results] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0
...
Jun 2, 2011 5:16:32 AM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664)
    at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
    at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
    at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
    at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
    at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringDecoder.decode(Unknown Source)
    at java.lang.StringCoding.decode(Unknown Source)
    at java.lang.String.<init>(Unknown Source)
    at java.lang.String.<init>(Unknown Source)
    at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:419)
    at com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(dtv.java:1974)
    at com.microsoft.sqlserver.jdbc.DTV.getValue(dtv.java:175)
    at com.microsoft.sqlserver.jdbc.Column.getValue(Column.java:113)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:1982)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:1967)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject(SQLServerResultSet.java:2256)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject(SQLServerResultSet.java:2265)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:286)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:228)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.next(JdbcDataSource.java:266)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.next(JdbcDataSource.java:260)
    at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:78)
    at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
    at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
    ... 5 more

Jun 2, 2011 5:16:32 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jun 2, 2011 5:16:44 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback

data-config.xml:

<dataConfig> 
  <dataSource type="JdbcDataSource" 
        driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
        url="jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor"   
        user="sa" 
        password="password"/> 
  <document> 
    <entity name="results" query="SELECT fielda, fieldb, fieldc FROM mydb.[dbo].mytable WITH (NOLOCK)"> 
      <field column="fielda" name="fielda"/><field column="fieldb" name="fieldb"/><field column="fieldc" name="fieldc"/> 
    </entity> 
  </document> 
</dataConfig> 

solrconfig.xml snippet:

<indexDefaults>
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>25</mergeFactor>
    <ramBufferSizeMB>128</ramBufferSizeMB>
    <maxFieldLength>100000</maxFieldLength>
    <writeLockTimeout>10000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
  </indexDefaults>
  <mainIndex>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>128</ramBufferSizeMB>
    <mergeFactor>25</mergeFactor>
     <infoStream file="INFOSTREAM.txt">true</infoStream>
  </mainIndex>

Java config settings: init mem 128mb, max 512mb

Environment:
solr 3.1
tomcat 7.0.12
windows server 2008
java: v6 update 25 (build 1.6.0_25-b06)
(data coming from:sql 2008 r2)

/admin/stats.jsp - DataImportHandler
    Status : IDLE
    Documents Processed : 2503083
    Requests made to DataSource : 1
    Rows Fetched : 2503083
    Documents Deleted : 0
    Documents Skipped : 0
    Total Documents Processed : 0
    Total Requests made to DataSource : 0
    Total Rows Fetched : 0
    Total Documents Deleted : 0
    Total Documents Skipped : 0
    handlerStart : 1306759913518
    requests : 9
    errors : 0 

EDIT: I am currently running a sql query to find out the largest single record’s field length, as I think this is probably cause of exception. Also, running import again with jconsole to monitor heap usage.

EDIT: Read solr performance factors page. changing maxFieldLength to 1000000 and changing ramBufferSizeMB = 256. Now for another import run (yay…)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T23:46:23+00:00Added an answer on May 22, 2026 at 11:46 pm
    Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding$StringDecoder.decode(Unknown Source)
        at java.lang.StringCoding.decode(Unknown Source)
        at java.lang.String.<init>(Unknown Source)
        at java.lang.String.<init>(Unknown Source)
        at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:419)
    

    makes it pretty obvious that The MS JDBC driver is running out of ram. Many JDBC drivers can default to fetching all their results at once in memory. So see if this can be tuned or consider using the opensource JTDS driver which is generally better behaved anyway

    I don’t believe maxfieldlength is gonna help you – that will affect how much Lucene truncates, but not how much is initially transferred. Another option is to only transfer a selection at a time, say a 1 million, using TOP and ROWNUMBER and such for paging.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to import a very large MySQL dump file into a new MySQL
I am currently trying to load a slightly large xml file into a dataset.
I am trying to import a large text file into a MySQL database. The
I have a txt file that I’m trying to import as flat file into
I am trying to import a large database but because there's an index, it
I'm trying to import a large subversion repository into git using git-svn (so that
I'm trying to import a large number of products from a DB into Magento.
I am trying to import a large test file into BigQuery and the import
I'm trying to import a very large .csv file (~4gb) into mysql. I was
I'm trying to import a large data set (~6,000) in to my core data

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.