Is there any automatic tool that I can transform legacy uniprocessor programs to the cloud, meaning that the target program is ready to execute in the cloud (e.g. programs written for Hadoop)? If not, what are the best practices when doing such transformations (maybe total rewrite) manually? Also, how can I know/evaluate whether a legacy program (or programming task) is suitable for computing?
Example: suppose I have a WordCount program written solely with standard Java library (e.g. HashMap), how can I transform it to one written with Hadoop like the one provided in the sample code of the Hadoop distribution?
I don’t think there is an automatic tool that can transform a legacy uniprocessor program to the cloud.
If the legacy program is written using the MapReduce paradigm then it should be somewhat easy to run in a cloud using Hadoop with some modifications. If not then the problem has to be thought in a MapReduce way and rewritten for Hadoop using Java or some other language which supports read/write to the STDIN/STDOUT.
Also, if the language in which the legacy program was written can read/write to the STDIN/STDOUT then you can use Hadoop Streaming.
If the processing can happen in parallel independently and the data can also be split across more than one machine, then it might be a suitable candidate for Hadoop.
HDFS (Hadoop Distributed File System) is designed for high latency and high throughput. If, the requirement is for low latency then you might consider HBase.
Also, HDFS is designed for large file (GB, TB and PB). If the legacy application has too many small file then an alternative approach has to be considered.
Hadoop runs straight out of the box with some minimum configuration changes, but to run it efficiently a lot of parameters have to be tweaked and some times it’s required to get straight into the code.
Also, try to a POC and start with something small to solve the problem area and evaluate the pros and cons.
Suggest to buy ‘Hadoop : The Definitive Guide’ book.