Can we store weka.jar on the hdfs and then make calls to its methods from the mapper and the reducer classes?
Say that I have a large number of instances stored in a file and I want to cluster them using WEKA. Can I read those instances and then make calls to the SVM methods of weka from my map-reduce programs?
You don’t have to store it on the HDFS manually. Instead, when you run your job you can use the -libjars jar1,jar2… option in order to automatically distribute the needed jar to all mappers and to automatically include those jars in the classpaths of all the mappers.
Another option (even though the first one represents the standard) is to include your WEKA jar in the lib dir of your hadoop installation.