There are many online judge sites which can verify your program by comparing its output to the correct answers. What’s more, they also check the running time to make sure that your program running time doesn’t exceed the maximum limit.
So here is my question, since some online judge sites run several test programs at the same time, how do they achieve performance isolation, i.e., how can they make sure that a user program running in a heavy-loaded environment will finish within the same time, as when it is running in an idle environment?
Operating systems keep track of CPU time separately from real-world “wall clock” time. It’s very common when benchmarking to only look at one or the other kind of time. CPU or file I/O intensive tasks can be measured with just CPU time. Tasks that require external resources, like querying a remote database, are best measured in wall clock time because you don’t have access to the CPU time on the remote resource.
If a judging site is just comparing CPU times of different tests, the site can run many tests simultaneously. On the other hand, if wall clock times matter, then the site must either use independent hardware or a job queue that ensures one test finishes before the next starts.