I run across this often enough that I figure there has to be a good idiom for it. Suppose I have a data.frame with a bunch of attributes, including “product.” I also have a key which translates products to brand + size. Product codes 1-3 are Tylenol, 4-6 are Advil, 7-9 are Bayer, 10-12 are Generic.
What’s the fastest (in terms of human time) way to code this up?
I tend to use nested ifelse‘s if there are 3 or fewer categories, and type out the data table and merge it in if there are more than 3. Any better ideas? Stata has a recode command that is pretty nifty for this sort of thing, although I believe it promotes data-code intermixing a little too much.
dat <- structure(list(product = c(11L, 11L, 9L, 9L, 6L, 1L, 11L, 5L,
7L, 11L, 5L, 11L, 4L, 3L, 10L, 7L, 10L, 5L, 9L, 8L)), .Names = "product", row.names = c(NA,
-20L), class = "data.frame")
One could use a list as an associative array to define the
brand -> product codemapping, i.e.:Once you have this, you can then either invert this to create a
product code -> brandlist (could take a lot of memory), or just use a search function:I’m sure there are better ways of writing this function (the
forloop is annoying me!), but at least it is vectorised, so it only requires a single pass through the list.Using it would be something like:
The
recodeandlevels<-solutions are very nice, but they are also significantly slower than this one (and once you havefind.keythis is easier-for-humans thanrecodeand on par with thelevels<-):(I can’t get the
switchversion to benchmark properly, but it appears to be faster than all of the above, although it is even worse-for-humans than therecodesolution.)