I have to do extensive data manipulation on a big data set (using data.table, RStudio mostly). I would like to monitor run time for each of my step without explicitly call system.time() on each step.
Is there a package or an easy way to show run time by default on each step?
Thank you.
It’s not exactly what you’re asking for, but I’ve written
time_file(https://gist.github.com/4183595) whichsource()s an R file, and runs the code, then rewrites the file, inserting comments containing how long each top-level statement took to run.i.e.
time_file()turns this:into this:
It doesn’t time code inside a top-level
{block, so you can choose not to time stuff you’re not interested in.I don’t think there’s anyway to automatically add timing as a top-level effect without somehow modifying the way that you run the code – i.e. using something like
time_fileinstead ofsource.You might wonder the effect that timing every top-level operation has on the overall speed of your code. Well, that’s easy to answer with a microbenchmark 😉
So timing adds relatively little overhead (20µs on my computer), but the default gc adds about 27 ms per call. So unless you have thousands of top-level calls, you’re unlikely to see much impact.