I use Java in this question but this really applies to all modern app development. Our “environment pipeline”, like many of them, looks like this:
- Developer sandbox
- Continuous integration & testing
- QA/Staging
- Production
The hardware, available RAM & CPU in each of these environments is different: my laptop is a 2GB dual-core Windows machine. Testing runs on a 4GB machine. Production is two (load-balanced) 8GB, quad-core servers.
Obviously the same code will perform differently when it runs on these different machines (environments).
I was thinking about writing automated performance tests for some of my classes that would be of the form:
private static final long MAX_TIME = 8000;
@Test
public final void perfTestSomething() {
long start = System.currentTimeInMillis();
// Run the test
long end = System.currentTimeInMillis();
assertTrue((end - start) < MAX_TIME);
}
Thus the automated performance test fails if the test takes more than, say, 8 seconds to run.
But then this realization dawned on me: the code will run different in different environments, and will run differently depending on the state of JVM and GC. I could run the same test 1000 times on my own machine and have wildly different results.
So I ask: how does one accurately/reliably define & gauge automated performance tests as code is promoted from one environment to the next?
Thanks in advance!
It may be that you only want to run the performance tests in a given location that is more tightly controlled. You don’t necessarily need to run them in all environments, there’s little benefit in that. You should run them in an environment that most closely mimics a production configuration (that’s what you REALLY care about, right?).
Also, make sure you give yourself reasonable overhead in your performance restrictions. Don’t lock them down to just above what your server does NOW. Select some reasonable thresholds to account for some variation in the current run.
Long term what I’ve found more useful is a graph over time of the performance numbers. Not a hard limit. That way we can watch trending of various functionality over time, and attack it when it trends too high.