PS: Correct me if I am wrong in any line
I am building a search engine with Nutch and Solr.
I know by using Solr, I can enhance the efficiency of Searching- let Nutch do the crawling alone of the entire web.
I also know that Hadoop is used to handle petabytes of data by forming clusters and MapReduce.
Now , What i want to know is that
1) Since,I’ll be running these open source softwares on only 1 machine,ie, my laptop on localhost… How would Hadoop be beneficial in my case as it forms clusters? How would clusters be formed on only 1 machine??
2) What would be the importance of MapReduce in my case?
3) How would MAHOUT,CASSANDRA and HBASE effect my engine???
Any help on this aspect is very much appreciated.Apologize me if I asked a noob question!!
Thanks
Regards
1) Since,I’ll be running these open source software on only 1 machine,ie, my laptop on localhost… How would Hadoop be beneficial in my case as it forms clusters?
How would clusters be formed on only 1 machine??
What would be the importance of MapReduce in my case?
How would MAHOUT,CASSANDRA and HBASE effect my engine???
I would suggest that you go back to your problem statement, design with as little tools as required and when you hit the notes, you will understand when some of these tools could be useful.