I have a question regarding data frames in R. I want to take a

Question

0

Asked: May 21, 20262026-05-21T11:42:10+00:00 2026-05-21T11:42:10+00:00

I have a question regarding data frames in R. I want to take a

0

I have a question regarding data frames in R. I want to take a data.frame, dfy, and find the first occurrence of dfy$workerId in dfx$workers, to create a new dataframe, dfz, a copy of dfx that also contains the first occurance of dfy$workerId in dfx$wokers as dfz$highestRankingGroup. Its a little tricky becuase dfx$workers is a single spaced seperated string. My original plan was to do this in Perl, but I would like to find a way to work in R and avoid having to write out to temp. files.

thank you for your time.

y <- "name,workerId,aptitude  
joe,4,34
steve,5,42 
jon,7,23 
nick,8,122"


x <- "workers,projectScore
1 2 3 8 ,92
1 2 5 9 ,89
3 5 7 ,85  
1 8 9 10 ,82  
4 5 7 8 ,83  
1 3 5 7 8 ,79" 

z <- "name,workerId,aptitude,highestRankingGroup
joe,4,0.34,5
steve,5,0.42,2
jon,7,0.23,3
nick,8,0.122,1"

dfy <- read.csv(textConnection(y), header=TRUE, sep=",", stringsAsFactors=FALSE)  
dfx <- read.csv(textConnection(x), header=TRUE, sep=",", stringsAsFactors=FALSE)  
dfz <- read.csv(textConnection(z), header=TRUE, sep=",", stringsAsFactors=FALSE)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T11:42:11+00:00

First, add the highestRankingGroup column to your dataset dfx

dfx$highestRankingGroup <- seq(1, length(dfx$projectScore))

Since you have mentioned perl you can do a familar perl thing and simple split the workers column in whitespaces. I combined the splitting with functions from the plyr package which are always nice to work with.

library(plyr)
df.l <- dlply(dfx, "projectScore")

f.reshape <- function(x) {
  wrk <- strsplit(x$workers, "\\s", perl = TRUE)
  data.frame(worker = wrk[[1]]
             , projectScore = x$projectScore
             , highestRankingGroup = x$highestRankingGroup
             )
}

df.tmp <- ldply(df.l, f.reshape)

df.z1 <- merge(df.tmp, dfy, by.x = "worker", by.y = "workerId")

Now you have to look for the max values in the projectScore column:

df.z2 <- ddply(df.z1, "name", function(x) x[x$projectScore == max(x$projectScore), ])

This produces:

R> df.z2
  worker .id projectScore highestRankingGroup  name aptitude
1      4  83           83                   5   joe       34
2      7  85           85                   3   jon       23
3      8  92           92                   1  nick      122
4      5  89           89                   2 steve       42
R>

You can reshape the df.z2 dataframe according to your personal taste. Simply look at the different steps and the produced objects in order to see at which step different columns, etc get introduced.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a question regarding data frames in R. I want to take a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply