I am Developing a Web Crawler,Which is Good for storing data? Cassandra or Hadoop Hive or MySQL?and why?i am having 1TB of Data from past 6 Months in my MySQL DB,i need to index them and i need to get the out put in my search ASAP,and as i think,it will store more amount of DATA,like 10 Peta Byes as my crawler are working fast,i need to get the read/write operation fast,i need to integrate it in my PHP app
I am Developing a Web Crawler,Which is Good for storing data? Cassandra or Hadoop
Share
That depends on details of your requirements, but I think that in your case HBase would be the best option.
Using HBase as a web-crawler database is well documented and it’s HBase’s use that is described in BigTable whitepaper.