I have a question regarding data frames in R. I want to take a data.frame, dfy, and find the first occurrence of dfy$workerId in dfx$workers, to create a new dataframe, dfz, a copy of dfx that also contains the first occurance of dfy$workerId in dfx$wokers as dfz$highestRankingGroup. Its a little tricky becuase dfx$workers is a single spaced seperated string. My original plan was to do this in Perl, but I would like to find a way to work in R and avoid having to write out to temp. files.
thank you for your time.
y <- "name,workerId,aptitude
joe,4,34
steve,5,42
jon,7,23
nick,8,122"
x <- "workers,projectScore
1 2 3 8 ,92
1 2 5 9 ,89
3 5 7 ,85
1 8 9 10 ,82
4 5 7 8 ,83
1 3 5 7 8 ,79"
z <- "name,workerId,aptitude,highestRankingGroup
joe,4,0.34,5
steve,5,0.42,2
jon,7,0.23,3
nick,8,0.122,1"
dfy <- read.csv(textConnection(y), header=TRUE, sep=",", stringsAsFactors=FALSE)
dfx <- read.csv(textConnection(x), header=TRUE, sep=",", stringsAsFactors=FALSE)
dfz <- read.csv(textConnection(z), header=TRUE, sep=",", stringsAsFactors=FALSE)
First, add the
highestRankingGroupcolumn to your datasetdfxSince you have mentioned
perlyou can do a familar perl thing and simple split theworkerscolumn in whitespaces. I combined the splitting with functions from theplyrpackage which are always nice to work with.Now you have to look for the max values in the
projectScorecolumn:This produces:
You can reshape the
df.z2dataframe according to your personal taste. Simply look at the different steps and the produced objects in order to see at which step different columns, etc get introduced.