I am attempting to create a data set from an original dataframe (in either R or Excel VBA code), and I am trying to generate another column. Here is a high-level situation:
dfr <- data.frame(
grp = rep(c("X", "Y"), each = 4),
id = c("A", "B", "C", "D", "E", "A", "B", "F"),
value = c(3, 7, 2, 4, 8, 9, 11, 2)
)
Of the second column, B is the “leader” of both group “X” and “Y” as it has the largest numbers. Thus, I need to pair all the observations other observations in the relevant groups (X, Y) to these leaders. For example, a sample output is below of what I need:
X B A 3
X B C 2
X B D 4
Y B E 8
Y B A 9
Y B F 2
The number on the furthest column is the respective number of the datapoint previously found.
So, I need help partitioning the data between X and Y (for countless string groups that exist) and afterwards, sorting it in the way I need and having that column produced, either in R-code or VBA for excel (the data is in CSV format)
**Disclaimer: If it isn’t obvious, my use of R is very limited – I used it for 4 months in an Applied Econometrics course, and now am finding myself in need of it again (9months later) so please excuse me if I seem like a novice…I can run regressions just fine though 🙂
*UPDATE
Following Henry’s code, this is where I am now.
data <- read.csv(file = "sort.csv", h=T)
attach(data)
sorted <- data[order(data$membernumber, -data$dailycirc),]
top <- function(df){ return(df[1,])}
moded <- unsplit(lapply(split(sorted, sorted$membernumber), top), unique(sorted$membernumber))[1:2]
names(moded) <- c("membernumber", "cnty")
merged <- merge(moded, data, by="membernumber")
merged[merged$cnty != merged$cnty, ]
summary(merged)
This has actually now given me some output. But I’m not seeing the sort, just a summary statistic of things like the mean/max. How do I actually export this into a CSV or spreadsheet so I can look at it like a table?
Thank you SO much for your help.
Since only R people seem to have waken up I felt obliged to post an example in VBA. I assume that you have opened your CSV in Excel with information in the first 3 columns, no title line (data starts on row 1). You copy that
data to a new spreadsheet, in a tab named “Sheet1”. You insert the code below in a new module and run it, and it will output the result in columns 5 to 8 of the same sheet. For 10,000 lines input, it runs in less than 0.1 sec on my machine.