I’m used to planning software where the complexity lies in the user interactions. The agile software engineering principles I learned work pretty well for that type of scenario. User stories are very easy to write out when most of the planning goes around the user’s interaction.
I’m now working on a system where the only intervention the user has is hitting the go button and reading errors if they occur.
All other work of this system is in data processing and very heavy data processing at that. I have about 5 different transformations of data to plan in this workflow of processing.
These processes are inherently loosely coupled, so they should be easy to plan as distinct processes and then worked into a workflow. Even so, the problem of planning for data driven processes still remains, but on a smaller scale.
How can I go about planning data driven processes like this? Are there any known design processes for this type of software?
The same!
Agile principles for planning and iterative development can be used for any type of project. This can still be users-driven, but you may have to extend the concept of “user”. Who is going to ultimately use the software you’re building? Yourself? A tech team? Other processes? “Real” users? Whoever they are, you need to include them in the design, and have them discuss the specs with you.
Start by prioritising what the users want. What is the “core” set of features they’d like to see, and/or what is the most important, architecture-defining features of your new process(es)? Plan the first few iterations on them. (After iteration 0 where you set up the development environment). At the end of these, your system will not do everything it should do, but that will be a start. Also, focus on stories that are end-to-end. It’s better to produce an output early on, even if it’s not the desired one, and then come back to refactor it to improve it.
Continue to write the user stories as you’re used to, maybe adding one sentence at the start of each story: “As XXX, I want to … in order to …”. So that each story is tightly linked to the original requester for that story. (XXX could be yourself, another system, or a real user).
Focus very early on on a comprehensive set of acceptance tests. (Maybe using an automated framework like JBehave or FITnesse (If you use Java, but there are alternatives for every language I suppose). For data-driven projects, these are paramount: they will act as documentation to your system, and will make it future-proof. You should build your accceptance tests this way: start from an “empty” (or “given” system), when you add XX and YY and ZZ as data, then the result should be AA, BB and CC. And don’t hesitate to hack and slash in your acceptance tests, as long as they are all the time seen and approved by the users. (Don’t make any assumptions on them, validate everything)
Then iteration after iteration, add layers of complexity until you reach the desired set of specifications.
I’ve been involved in several medium to large scale projects based on data management and processing (repositories, including merging from different sources, maintaining a “golden source”, bi-temporal databases, feeding other external systems, etc.) and basically, the more agile the team was, the more successful the project was. By far.