I am developing a web application in which I am running Java in the front end and shell script at the back end. The application is mainly about analysis many files and the java program gets the inputs from the user such as which file they want to analyze from which date to which date they want to analyze.Lets assume user gives data from July 1-8. I need to process the 8 days file. Each day has about 100 files to be processed. So my goal is to make this process in parallel than doing this sequential. I have basically two ideas regarding this. I wanted to share this with you ppl and get ur suggestions.
PLAN 1:
There is a Java program(Business Layer), which invokes a shell script using process builder. Can I split the given date by the user, for instance (1-8) into 4 threads where each thread would do the operation of two days. such as (1-2) thread 1 and (3-4) thread 2 and it goes on. If I follow this approach what are all the pros and cons. Also how coordinate among the threads by this approach.
Plan 2:
Call the shell script from Java and inside the shell script spawn multiple processes and as I said earlier, I can spawn process 1 to do the job of date (1-2) and process 2 (3-4) and it goes on. What are all the pros and cons of this approach. And I am writing the processed output into a single file. So if I have multiple processes how can I make the single file updated by multiple processes.
Also any reference of any links related to my question
IMPORTANT:
As I told I need to process 100’s of log files for each day inside a shell script, and one of my requirement is to constantly update my front end environment regarding the status of my jobs in shell script (i.e) day 1 has completed, day 2 has completed and so on . I know I can do echo from shell script and then I can get the value from Java. But the problem is if I do an echo inside the shell script, inside the loop of processing the files, my call terminates and I again have to call back from Java. Any ideas of how to make this update happen.
First, I would suggest considering the first rule of optimization: do not optimize.
Then if you really think you need to optimize it, I would pick the 1st approach and do as much as possible in Java.
One approach could be the following:
1) run all the processes with ProcessBuilder and create a
List<Process>2) Wrap each Process into a ShellScriptProcess and acquire a
List<ShellScriptProcess>3) wait for processes to finish
This is only a very rough solution, just to demonstrate the idea of how this could be acomplished. And I didn’t test the code, it might contain of syntax errors 🙂 Hope this helps.