I’m looking at building some data warehousing/querying infrastructure, right now on top of Map/Reduce solutions like Hadoop.
However, it strikes me that all the M/R work is just repeating what the RDBMS guys have solved for the last 20 years with parallel SQL databases. Parallel SQL implementations scale reads and writes across nodes, just like M/R, but additionally already contains the niceties from regular databases (SQL, existing integration libraries, etc).
The problem is: you don’t seem to find the customers of those companies posting much online. So, does anyone here have experience with those kinds of solutions, and can give me some insight and/or links?
I have used Netezza and Hadoop. And have second hand knowledge of Infobright, a column database.
Netezza is a true database and implements ACID properties, which has both a cost and a benefit. Netezza is moving toward allowing more M/R code to run on its table data with the new architecture of twinfin. In the previous version of the appliance they supported user-defined functions and aggregations. In the new version, which runs linux on the SPUs and uses Intel processors, the door is opening to do more custom code close to the data. My experience with Netezza has been very positive – both the technology and the company.
Hadoop is pure map-reduce computing. It doesn’t incur the cost of ACID database properties. So, it’s really a different beast than Netezza. Depending on the use pattern it may be better and certainly cheaper than Netezza. Hadoop had supports Hbase and Hive that may give you the query convenience you need at a lower cost.
Another developer on our team evaluated Infobright, so this is second hand, and found the load performance to be poor and some of the aggregations to be slow. It has some parallels with Netezza (e.g. zone maps are used in netezza to help narrow scan scope). Infobright is open source with both a community and a supported enterprise edition.
There is much more that can be said in context of your particular problem – probably beyond the scope of this forum. Hope this helps.