I have a data.table with columns 2 through 20 as strings with spaces (e.g., “Species Name”). I want to run str_replace() on all those columns simultaneously so all the “Species Name” become “Species_Name”. I can either do:
data.table(apply(as.data.frame(dt[,2:dim(dt)[2], with=F]), 2,
function(x){ str_replace(x," ","_") }))
or if I keep it as a data.table object, then I can do this one column at a time:
dt[,SpeciesName := str_replace(SpeciesName, " ", "_")
How do I do this for all columns 2 through the end similar to the one of the above?
Completely rewritten on 2015-11-24, to fix an error in previous versions.
Also added more modern options on 2019-09-27
You have a few options.
Process all of the target columns with an embedded call to
lapply(), using:=to assign the modified values in place. Thisrelies on
:=‘s very handy support for simultaneous assignment to several column named on its LHS.Use a
forloop to run through the target columns one at a time,using
set()to modify the value of each one in turn.Use a
forloop to iterate over multiple “naive” callsto
[.data.table(), each one of which modifies a single column.These methods all seem about equally fast, so which one you use will
be mostly a matter of taste. (1) is nicely compact and
expressive. It’s what I most often use, though you may find (2)
easier to read. Because they process and modify the columns one at a time, (2) or (3) will have an advantage in the rare situation in which your data.table is so large that you are in danger of running up against limits
imposed by your R session’s available memory.
For more details on
set()and:=, read their help page, gotten by typing?setor?":=".