When using DBOutputFormat with hadoop, say the final result is to go to MySql database. Will Hadoop create separate connection each time a result has to be written? (Would DB be burdened with too many open connections). I have not used the format, so any suggestion on the same is acceptable. Would it have a performance upperhand over Sqoop? Sqoop can also be used to export output file to DB. Please share your views.
Share
Here’s an explanation I found this blog post from Cloudera:
So it appears that each individual reducer will only open one connection, so the database probably won’t have too many open connections, but it still could cause performance issues. I don’t know for sure, but Sqoop is probably slightly more efficient and robust.