I am developing high volume processing systems. Like mathematical models that calculate various parameters based on milions of records, calculated derived fields over milions of records, process huge files having transactions etc…
I am well aware of unit testing methodologies and if my code is in X# I have no problem in unit testing is. Problem is I often have code in T-SQL, C# code that is a SQL stored assembly, and SSIS workflow with a good amount of logic (and outcomes etc) or some SAS process.
What is the approach YOu use when developing such systems. I usually develop several tests as Stored procedures in a designed schema(TEST) and then automatically run them overnight and check out the results. But this is only for T-SQL. But the problem is with testing SSIS packages. How do You test it? What is Your preferred approach for stubbing data into tables (especially if You need a lot data initialization). I have some approach derived over the years but maybe I am just not reading enough articles.
So Banking, Telecom, Risk developers out there. How do You test your mission critical apps that process milions of records at end day, month end etc? What frameworks do You use? How do You validate that Your ssis package is Correct (as You develop it)/ How do You achieve continous integration in such an environment (Personally I never got there)? I hope this is not to open-ended question. How do You test Your map-reduce jobs for example (i do not use hadoop but this is quite similar).
luke
Hope that this is not to open ended
Firstly build logging, monitoring & double entry systems into what you’re building.
Ensure that even with these systems switched on, performance is acceptable, so benchmark, and profile these, and ensure the hardware is appropriate for the entire system.
Split each system into sub-systems which can be tested independently, so try and ensure systems are designed to be quite loosely coupled.
Also ensure each sub-system validates their inputs before processing further, this ensures erroneous data is stopped before it becomes a bigger problem.
By using logging, you can test a variety of systems in a similar way.
For any system which doesn’t have unit test frameworks available, use logging, and then test the logs generated.
This should allow you to test SSIS processes, Workflow’s, or assembly’s.
Monitoring & double entry systems, will flag up errors & process problems, so you can identify and ideally resolve them in a timely fashion.
Finally, when systems go live, don’t switch logging off entirely.
If necessary, reduce it’s verbosity, but ensure this can be switched on, to debug processes, as problems will still occur in the live environment which you need to resolve.
Ensure you use live data, and edge cases, for automated testing.
Use code reviews or pair programming to ensure the code is optimal.
Ensure you use expert QA staff to think of use cases you won’t think of.
Ensure you have a excellent project manager, who can manage you, your team, the related teams, the end users, and your bosses, and ensure everyone is communicating appropriately.
You won’t be able to achieve well tested processes without a well run team.
Using some of the above, has allowed us to develop well tested processes, which handles billions of pounds worth of transactions annually, so we must be doing something right.