I need to process 300 GB log files per day (Import -> Convert to other format -> Upload).
Which is the best technology to choose? Spring Batch or Apache Hadoop? I am very new to these technologies and could not trace there limitations. Moreover, the file size may grow day by day unlimited. Apart from this, I need to design the application with given hardware 3 Sun Blade Servers with Disaster Recovery method. Please suggest.
If it is a mere file format conversion, there is no point in using Hadoop, unless you can break the file into chuncks that can be distributed in an embarassingly parallel manner.
Additionally, if you do not need to store / index / process these information any further, there is no need to keep it stored somewhere.
Last but not least, do evaluate the cost of breaking the file into units as part of the overall computational cost.