I have a client with several applications handling sensitive data stored in eXist—a native XML database. I would like to test with the production data in my development environment, but there are regulatory concerns with exporting live data out of production.
Is there a tool in the XML community that can obfuscate sensitive production data by producing a realistic dataset suitable for testing?
Previously, I have used gems like faker with Rails apps, though I have been unable to find similar solutions that can be easily applied to XML data storage. Any thoughts?
Sample Scenario – One of these applications involves managing financial metrics, data protected by the Sarbanes–Oxley Act in the United States. If that data were leaked from a developer’s laptop, the company can be held liable for millions in damages. There are similar situations with other applications that track customer data—if the real data is lost, the consequences are severe and expensive.
With that in mind, these application now need new features, and the old test data is woefully inadequate, both in size (4(!) entries instead of 400k) and quality (dollar amounts are highly unrealistic for the business context).
Is there a tool that can easily transform specific values (e.g. names, numbers, email addresses) into random values that are reasonable or realistic (take a look at the Faker gem for an example)?
Something like this may be useful:
dpawson.co.uk/xsl/sect2/N3773.html#d5234e197