Hadoop is mainly used to process unstructured or semi-structured data. I want to use Hadoop to process large amount of structured data.
Though hadoop is capable of reading from database (via DBInputFormat), it is not considered as a scalable approach as number of database connection would be limited.
Has anybody used hadoop to read data from RDBMS? What was the performance? How many nodes could it support?
Thanks
You can use Sqoop to import data from RDBMS to Hadoop.
Hadoop shines at processing unstructured data because you are pushing the constraints (creating structured data) to the end. This also allows for creativity on what structure to put, which will define the kind of information you can extract.
It is never said that you can not process structured data but the mileage obtained is low. RDBMS can process structured data as efficiently.