We have a suite of converters that take complex data and transform it. Mostly the input is EDI and the output XML, or vice-versa, although there are other formats.
There are many inter-dependencies in the data. What methods or software are available that can generate complex input data like this?
Right now we use two methods: (1) a suite of sample files that we’ve built over the years mostly from files bugs and samples in documentation, and (2) generating pseudo-random test data. But the former only covers a fraction of the cases, and the latter has lots of compromises and only tests a subset of the fields.
Before go further down the path of implementing (reinventing?) a complex table-driven data generator, what options have you found successful?
Well, the answer is in your question. Unless you implement a complex table-driven data generator, you’re doing the things right with (1) and (2).
(1) covers the rule of “1 bug verified, 1 new test case”.
And if the structure of the pseudo-random test data of (2) corresponds whatsoever in real life situations, it is fine.
(2) can always be improved, and it’ll improve mainly over time, when thinking about new edge cases. The problem with random data for tests is that it can only be random to a point where it becomes so difficult to compute the expected output from the random data in the test case, that you have to basically rewrite the tested algorithm in the test case.
So (2) will always match a fraction of the cases. If one day it matches all the cases, it will be in fact a new version of your algorithm.