Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9149801
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T11:31:35+00:00 2026-06-17T11:31:35+00:00

I am trying to optimize the performance of a search engine by preprocessing all

  • 0

I am trying to optimize the performance of a search engine by preprocessing all the results. We have around 50k search terms. I am planning to search these 50k terms before hand and save it in memory (memcached/redis). Searching for all 50k terms takes more than a day in my case since we do deep semantic search. So I am planning to distribute the searching (preprocessing) over several nodes. I was considering to use hadoop. My input size is very less. Probably less than 1MB even though total search term is over 50k. But searching each term takes over a min i.e more computation oriented than data oriented. So I am wondering if I should use Hadoop or build my own distributed system. I remember reading that hadoop is used mainly if input is very huge. Please suggest me on how to go about this.

And I read hadoop reads data in block size. i.e 64mb for each jvm/mapper. Is it possible to make it number of lines instead of block size. Example: Every mapper gets 1000 lines instead of 64mb. Is it possible to achieve this.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T11:31:35+00:00Added an answer on June 17, 2026 at 11:31 am

    Hadoop can definitely handle this task. Yes, much of Hadoop was designed to handle jobs with very large input or output data, but that’s not it’s sole purpose. It can work well for close to any type of distributed batch processsing. You’ll want to take a look at NLineInputFormat; it allows you to split your input up based on exactly what you wanted, number of lines.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hi I have a function in R that I'm trying to optimize for performance.
In our product we have a generic search engine, and trying to optimze the
I'm trying to optimize query performance and have had to resort to using optimizer
I'm trying to optimize the performance of my code, but I'm not familiar with
I'm trying to optimize my code using Neon intrinsics. I have a 24-bit rotation
I'm currently trying to optimize my program. I have a large database which consists
I'm trying to optimize a set of stored procedures. These stored procedures are on
i am trying to optimize performance for my database. My question is - what
I have a browse category query that im trying to optimize. Im ending up
We are currently trying to optimize the performance of an ASP.NET based website which

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.