I have a data table (DatosMex) in R and would like to recode a column within it named industry. The distinct categories for this variable are:
Agricultura,Ganaderia,Pesca,Caza Forestal
Asociaciones
Comercio
Construccion
Energia,Petroleo,Gas,Mineria
Gobierno
Industria
N/A
NULL
Servicios
I want to create a new variable, say gr_industry, that groups some categories. For instance, my new variable must group the categories Agricultura,Ganaderia,Pesca,Caza Forestal, Asociaciones,Energia,Petroleo,Gas,Mineria and Gobienro and assign them the code 1.
How would you do this using the data.table package syntax?
My approach was this:
#Create an id for each industry
DatosMex[,cod_industria:=as.numeric(DatosMex$industry)]
#Create a new data table
ind =data.table(cod_industria=c(1:10),gr_industry=c(1,1,2,3,1,1,4,6,6,5))
setkey(DatosMex,cod_industria)
setkey(ind,cod_industria)
DatosMex[ind]
So, as you can see, I had to create a new data table ind and then do the inner join. My question is: is there another way of doing this using the data.table way? I don’t want to create a table each time I need to do something similar. Also, I’d like to avoid using if statements.
I’m guessing one does not need to set a key or create a new data.table. The
[function is generally very fast, especially in datatable-objects:If that grouping translation vector is large then you can refer to it by name, even if it is outside the data.table.