We need to stress test our Oracle database with about 5 million row inserts. According to our DBA, the only columns that need to be different are the Primary or foreign key…all other columns can be the same. He said if we do that, then Oracle will not do any sort of caching when inserting the data.
I just want to make sure that he is right and that by doing this, the stress testing results would be nearly as accurate as using random data. Thank you for your help.
In a very narrow set of circumstances, the DBA is correct. If ALL your queries are lookups based upon primary and foreign keys, then they may be right. In the past when the rule-based optimizer was king, then the data didn’t matter so much. Record counts, yes, but not really the data.
In the real world, though, this is not the case. Do you have any other indexes? Then the data matters. Do you join against things other than primary/foreign keys? Then the data matters. Are your strings all 1 byte or null? I doubt it, and the size of these variable-length fields may affect the amount of IO. Basically, for any non-trivial schema in a non-trivial application, having “realistic” data can be significant. The Oracle optimizer takes into account a large variety of statistics when determining how to perform a query.
Are you REALLY only doing inserts in this load test? That’s kinda silly. 5 million records is chump change by modern standards. Desktops do that in seconds, typically. Even simple applications will perform some select to do a lookup, or get a set of records based upon a non-key value.
You seem to be smart enough to evaluate the DBA’s statement. If you can get him to put that in writing, sign off on it, and have the responsibility fall on him when his idea of a load test doesn’t work as expected, then that’s great. It sounds like you’re the one responsible for this test, though.
If I were in your shoes, I would want to load test with the most accurate data possible. Copying from a production system or known test set of data is a much better option than “random” and light-years better than “nulls except for the primary key” approach.