I have been using Hadoop quite a while now. After some time I realized I need to chain Hadoop jobs, and have some type of workflow. I decided to use Oozie , but couldn’t find much of information about best practices. I would like to hear it from more experienced folks.
Best Regards
The best way to learn oozie is to download the examples tar file that comes with the distribution and run each of them. It has an example for mapreduce, pig , streaming workflow as well as sample coordinator xmls.
First run the normal workflows and once you debug that , move to running the workflows with coordinator so that you can take it step by step. Lastly one best practice would be to make most of your variables in workflow and coordinator be to configurable and supplied through a component.properties file so that you don’t have touch the xml often.
http://yahoo.github.com/oozie/releases/3.1.0/DG_Examples.html