I have a huge amount of data needs to be indexed and it took

Question

0

Asked: June 8, 20262026-06-08T11:06:42+00:00 2026-06-08T11:06:42+00:00

I have a huge amount of data needs to be indexed and it took

0

I have a huge amount of data needs to be indexed and it took more than 10 hours to get the job done. Is there a way I can do this on hadoop? Anyone has done this before? Thanks a lot!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T11:06:44+00:00

You haven’t explained where does 10hr take? Does it take to extract the data? or does it take just to index the data.

If you are taking long time on the extraction, then you may use hadoop. Solr has a feature called bulk insert. So in your map function you could accumulate 1000s of record and commit for index in one shot to solr for large number of recods. That will optimize your performance alot.

Also what size is your data?

You could collect large number of records in reduce function of map/reduce job. You have to generate proper keys in your map so that large number of records go to single reduce function. In your custom reduce class, initialize solr object in setup/configure method, depending on your hadoop version and then close it in cleanup method.You will have to create a document collection object(in solrNet or solrj) and commit all of them in one single shot.

If you are using hadoop there is other option called katta. You can look over it as well.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a huge amount of data needs to be indexed and it took

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply