I’m an amateur C++ programmer trying to learn about basic shell scripting. I have a complex C++ program that currently reads in different parameter values from Parameters.h and then executes one or more simulations with each parameter value sequentially. These simulations take a long time to run. Since I have a cluster available, I’d like to effectively parallelize this job, running the simulations for each parameter value on a separate processor. I’m assuming it’s easier to learn shell scripting techniques for this purpose than OpenMPI. My cluster runs on the LSF platform.
How can I write my input parameters in Bash so that they are distributed among multiple processors, each executing the program with that value? I’d like to avoid interactive submission. Ideally, I’d have the inputs in a text file that Bash reads, and I’d be passing two parameters to each job: an actual parameter value and a parameter ID.
Thanks in advance for any leads and suggestions.
my solution
GNU Parallel does look slick, but I ended up (with the help of an IT admin) writing a simple bash script that echos to screen three inputs (a treatment identifier, treatment/parameter value, and a simulation identifier):
#!/bin/bash
j=1
for treatment in cat treatments.txt; do
for experiment in cat simulations.txt; do
bsub -oo tr_${j}_sim_${experiment}_screen -eo tr_${j}_sim_${experiment}_err -q short_serial "echo \"$j $treatment $experiment\" | ./a.out"
done
let j=$j+1
done
The file treatments.txt contains a list of the values I'd like to vary, simulations.txt contains a list of all the simulation identifiers I'd like to run (currently just 1,...,s, where s is the total number of simulations I want for each treatment), and the treatments are indexed 1...j.
Say you want to run the program
simulatewith inputsfoo,bar,bazandquuxin parallel, then the simplest way is:The problem is that all simulations are launched at the same time, so if your number of inputs is much larger than your number of CPUs/cores, they may swamp the system.