Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7890743
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T06:26:07+00:00 2026-06-03T06:26:07+00:00

I have a dataframe similar to the one generated below. Some individuals have more

  • 0

I have a dataframe similar to the one generated below. Some individuals have more than one observation for a particular variable and each variable has an associated standard error (SE) for the estimate. I would like to create a new dataframe that contains only a single row for each individual. For individuals with more than one observation, such as Kim or Bob, I need to calculate a precision weighted average based on the standard errors of the estimates along with a variance for the newly calculated weighted mean. For example, for Bob, for var1, this means that I would want his var1 value in the new dataframe to be:

weighted.mean(c(example$var1[2], example$var1[10]), 
   c(1/example$SE1[2], 1/example$SE1[10]))

and for Bob’s new SE1, which would be the variance of the weighted mean, to be:

1/sum(1/example$SE1[2] + 1/example$SE1[10])

I have tried using the aggregate function and am able to calculate the arithmetic mean of the values, but the simple function I wrote does not use the standard errors nor can it deal with the NAs.

aggregate(example[,1:4], by = list(example[,5]), mean)

Would appreciate any help in developing some code to work through this problem. Here is the example dataset.

set.seed(1562)
example=data.frame(rnorm(10,8,2))
colnames(example)[1]=("var1")
example$SE1=rnorm(10,2,1)
example$var2=rnorm(10,8,2)
example$SE2=rnorm(10,2,1)
example$id= 
  c ("Kim","Bob","Joe","Sam","Kim","Kim","Joe","Sara","Jeff","Bob")
example$SE1[5]=NA
example$var1[5]=NA
example$SE2[10]=NA
example$var2[10]=NA
example

       var1      SE1      var2        SE2   id
1   9.777769 2.451406  6.363250  2.2739566  Kim
2   8.753078 2.174308  6.219770  1.4978380  Bob
3   7.977356 2.107739  6.835998  2.1647437  Joe
4  11.113048 2.713242 11.091650  1.7018666  Sam
5         NA       NA 11.769884 -0.1310218  Kim
6   5.271308 1.831475  6.818854  3.0294338  Kim
7   7.770062 2.094850  6.387607  0.2272348  Joe
8   9.837612 1.956486  8.517445  3.5126378 Sara
9   4.637518 2.516896  7.173460  2.0292454 Jeff
10  9.004425 1.592312        NA         NA  Bob
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T06:26:09+00:00Added an answer on June 3, 2026 at 6:26 am

    I like the plyr package for these sorts of problems. It should be functionally equivalent to aggregate, but I think it is nice and convenient to use. There are lots of examples and a great ~20 page intro to plyr on the website. For this problem, since the data starts as a data.frame and you want another data.frame on the other end, we use ddply()

    library(plyr)
    #f1()
    ddply(example, "id", summarize, 
          newMean = weighted.mean(x=var1, 1/SE1, na.rm = TRUE),
          newSE = 1/sum(1/SE1, na.rm = TRUE)
          )
    

    Which returns:

        id newmean   newSE
    1  Bob  8.8982 0.91917
    2 Jeff  4.6375 2.51690
    3  Joe  7.8734 1.05064
    4  Kim  7.1984 1.04829
    5  Sam 11.1130 2.71324
    6 Sara  9.8376 1.95649
    

    Also check out ?summarize and ?transform for some other good background. You can also pass an anonymous function to the plyr functions if necessary for more complicated tasks.

    Or use data.table package which can prove faster for some tasks:

    library(data.table)
    dt <- data.table(example, key="id")
    #f2()
    dt[, list(newMean = weighted.mean(var1, 1/SE1, na.rm = TRUE),
              newSE = 1/sum(1/SE1, na.rm = TRUE)),
       by = "id"]
    

    A quick benchmark:

    library(rbenchmark)
    #f1 = plyr, #f2 = data.table
    benchmark(f1(), f2(), 
              replications = 1000,
              order = "elapsed",
              columns = c("test", "elapsed", "relative"))
    
          test elapsed relative
        2 f2()   3.580   1.0000
        1 f1()   6.398   1.7872
    

    So data.table() is ~ 1.8x faster for this dataset on my simple laptop.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a dataframe with numeric entries like this one test <- data.frame(x =
I have a dataframe with one column that I would like to split into
Suppose I have a dataframe like this one: df <- data.frame (id = c(a,
I have a dataframe in R that I loaded from a CSV file. One
I have a dataframe of 9 columns consisting of an inventory of factors. Each
Say I have a dataframe df with two or more columns, is there an
I have a dataframe that I would like to plot in a similar way
I have a guest list that has a last name in one column and
Each time when I have to recode some set of variables, I have SPSS
I have a question similar to this one , but my dataset is a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.