I have two big and small dataframes (actually dataset is very very big !). The following just for working.
big <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55)
SN names var
1 1 A 51
2 2 B 52
3 3 C 53
4 4 D 54
5 5 E 55
small <- data.frame (names = c("A", "C", "E"), type = c("New", "Old", "Old") )
names type
1 A New
2 C Old
3 E Old
Now I need to create and new variable in “big” with the help of “type” variable in small. The names in small and big will match and corresponding type will be stored in column type. If there is no match between the names columns it will be result in new value “unknown”. The expected output is as follows:
resultdf <- data.frame(SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
type = c("New","Unknown", "Old", "Unknown", "Old"))
resultdf
SN names var type
1 1 A 51 New
2 2 B 52 Unknown
3 3 C 53 Old
4 4 D 54 Unknown
5 5 E 55 Old
I know this is simple question for experts but I could not figure it out.
First use
merge()with the argumentall=TRUEto merge the two data.frames, keeping rows ofbigthat found no matching value in thesmall$names. Then, replace those elements ofbig$typethat didn’t find a match (marked bymerge()with “NA”s) with the string “Unknown”.Note that because
bigandsmallshare just one column name in common, that column is by default used to perform the merge. For more control over which columns are used as the basis of the merge, see the function’s by, by.x, and by.y arguments.