I found many options recently, and interesting in their comparisons primarely by maturity and stability.
- Crunch – https://github.com/cloudera/crunch
- Scrunch – https://github.com/cloudera/crunch/tree/master/scrunch
- Cascading – http://www.cascading.org/
- Scalding https://github.com/twitter/scalding
- FlumeJava
- Scoobi – https://github.com/NICTA/scoobi/
Scalding also has the advantage of significant open source projects built atop it, such as Matrix API and Algebird.
Here are some examples:
http://sujitpal.blogspot.com/2012/08/scalding-for-impatient.html
Cascalog was released almost two years before Scalding, and arguably has more advanced features for building robust workflows:
https://github.com/nathanmarz/cascalog/wiki