Suppose I want to write a function in R which is a function of a couple of sufficient statistics on some data. For example, suppose the function, call it foo.func depends only on the sample mean of a sample of data. For convenience, I think users might like to pass to foo.func the sample of random variables (in which case foo.func computes the sample mean), or the sample mean itself, which is all that foo.func needs. For reasons of efficiency, the latter is preferred if there are multiple functions like foo.func being called which can take the sample mean. In that case the mean need only be computed once (in the real problem I have, the sample statistics in question might be computationally intensive).
In summary, I would like to write foo.func to be accessible to the beginner (pass in the data, let the function compute the sufficient statistics) as well as the expert (precompute the sufficient statistics for efficiency and pass them in). What are the recommended practices for this? Do I have a logical flag passed in? Multiple arguments? Some ways to do it might be:
#optional arguments
foo.func <- function(xdata, suff.stats=NULL) {
if (is.null(suff.stats)) {
suff.stats <- compute.suff.stats(x)
}
#now operate on suff.stats
}
or
#flag input
foo.func <- function(data.or.stat, gave.data=TRUE) {
if (gave.data) {
data.or.stat <- compute.suff.stats(data.or.stat)
}
#now operate on data.or.stat
}
I am leaning towards the former, I think
You can also embed functions into the arguments, as:
As an example:
Alternatively, you can either use a default setting of
NULLfor various arguments, and test foris.null(argument), or simply check the value ofmissing(argument)for each for each argument you might calculate.Update 1: I erred in suggesting use of a default value of
NA: it is far more appropriate to useNULL. UsingNAandis.na()will behave oddly for vector inputs, whereasNULLis just a single object – one cannot create a vector of NULL values, sois.null(argument)behaves as expected. Apologies for the forgetfulness.