I’ve recently worked on refactoring a system that processes bundles of client data. The system executes a series of steps, each of which consumes files from previous steps (and sometime in-memory data), and produces its own output, in the form of files or data. Sometimes the output data for a particular step is already available. I have to be careful to make sure that, when one step fails, we continue to run all possible steps (ones that don’t depend on the failed step), so that the final output is as complete as possible. Furthermore, not all steps have to be run in all situations.
Previously, the relationships were all implicit in the structure of the code. For instance:
void processClientData() {
try {
processA();
} catch(Exception e) {
log.log(Level.SEVERE, "exception occured in step A", e);
processC(); // C doesn't depend on A, so we can still run it.
throw e;
}
processB();
processC();
//etc... for ~20 steps
}
I changed this to make the dependencies explicit, the error handling uniform, etc, by introducing Tasks:
public interface Task {
List<Task> getDependencies();
void execute(); //only called after all dependencies have been executed
}
public class TaskRunner {
public void run(Set<Task> targets) {
// run the dependencies and targets ala ANT
// make sure to run all possible tasks on the "road" to targets
// ...
}
}
This starts to feel a lot like a very watered-down version of a build system with dependency management (ANT, being most familiar to me). I don’t want to pull in ANT for this kind of thing, and I certainly don’t want to write out the XML.
I have my system up and running (mostly), but it still feels a bit hacked together, and I have since reflected on how much I hate to be reinventing the wheel. I would expect that this is a fairly common problem – one that has been solved many times over by people smarter than me. Alas, a few hours of googling turned up nothing
Is there a library that implements this sort of thing, without being a really heavy-weight build system? I’d also appreciate any pointers, including libraries in other languages (or even novel systems) that I should take inspiration from.
EDIT: I appreciate the suggestions (and I will give them due consideration), but I’m really NOT looking for a “build system” per se. What I am looking for is something more like the kernel of a build system, that I could just call directly from Java and use as a small, low-overhead library for doing said dependency analysis, task execution, and resulting resource management. Like I said, I have existing (working) code in pure Java, and I don’t want to bring in XML and all of the baggage that comes with it, without a very compelling reason.
Take a look at jsr166 fork/join framework. It seems to me this is exactly what you’re trying to accomplish.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
This is included in JDK7 but is available as a separate jar for 5 and 6. If I wasn’t on my tablet I’d write a more comprehensive example. Maybe someone else can expand in the meantime.
You also have to take care if your graph is unconnected, but there are a well known set of algorithms for determining this.