I am an R neophyte, with a data frame of database function runtimes with the following data:
> head(data2)
dbfunc runtime
1 fn_slot03_byperson 38.083
2 fn_slot03_byperson 32.396
3 fn_slot03_byperson 41.246
4 fn_slot03_byperson 92.904
5 fn_slot03_byperson 130.512
6 fn_slot03_byperson 113.853
The data has data for 127 discrete functions comprising of some 1940170 rows.
I would like to:
- Summarise the data to only include database functions with a mean runtime of over 100 ms
- Produce boxplots of the 25 slowest database functions showing the distribution of runtimes, sorted by slowest first.
I’m particularly stumped by the summary step.
Note : I’ve also asked this questions at stats.stackexchange.com.
Here’s one approach using
ggplotandplyr. The steps you outlined could be combined to be slightly more efficient, but for learning purposes I’ll show you the steps as you asked them.Like I said, I feel a few of those steps could be combined and made more efficient, but wanted to show you the steps as you outlined them.