I have a data set base_data which has missing values. I have therefore used

Question

0

Asked: June 11, 20262026-06-11T17:37:45+00:00 2026-06-11T17:37:45+00:00

I have a data set base_data which has missing values. I have therefore used

0

I have a data set “base_data” which has missing values. I have therefore used the package ‘Amelia’ to impute the missing values into an object “a.output”.

I have been able to find the mean for some variables within the imputed results using the following code:

q.out<-NULL
se.out<-NULL
for(i in 1:m) {   
dclus <- svydesign(id=~site, data=a.output$base_data[[i]]) 

q.out <- rbind(q.out, coef(svymean(~hh_expenditure, dclus)))
se.out <- rbind(se.out, SE(svymean(~hh_expenditure, dclus)))}

I have combined the results using:

svymean.combine <- mi.meld(q = q.out, se = se.out)

Which gives me the mean and standard error for household expenditure (hh_expenditure) across the population.

However I have a variable which splits the population into wealth quintiles (wealth_quin).

As such, I am now wanting to find the average, and standard error, of the household expenditure per wealth_quin (a variable which is either 1,2,3,4,or 5).

I initially tried subsetting the imputed data, but this came up with many errors.

Is there a way to do this without having to split up the data into the 5 wealth quintiles before imputing the data?

Cheers,

Timothy

EDIT: HERE IS A WORKABLE EXAMPLE

require(Amelia)
require(survey)
a<-as.data.frame(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
b<-as.data.frame(c(1,2,2,1,2,1,1,2,1,2,2,1,1,2,1,2))
c<-as.data.frame(c(2,7,8,5,4,4,3,8,7,9,10,1,3,3,2,8))
d<-as.data.frame(c(3,9,7,4,5,5,2,10,8,10,12,2,4,4,3,7))
e<-as.data.frame(c(2500,8000,NA,4500,4500,NA,2500,NA,7400,9648,1112,1532,3487,3544,NA,7000)

impute<-cbind(a,b,c,d,e)
names(impute) <- c("X","site","var2","var3", "hh_inc")

so no we have a data frame to work with, with missing values for hh_inc which I want to impute.
first step, set the number of imputations

m<-5

now run the imputation:

a.output <- amelia(x = impute, m=m, autopri=0.5,cs="X",
               idvars=c("site","var2"),
               logs=c("hh_inc","var3"))

a.output is now holds the data from the 5 imputations.

What I now want to do is find the average (and standard error) hh_inc for site 1 and site 2 separately using the imputed values from amelia.

How is that possible to do? I know it is possible to do if I just ignore the NA’s. But this might introduce bias, hence why I imputed the values in the first place.

Cheers,

Timothy

EDIT:
I have placed a bounty to this. If no one knows the exact way to do it, then the results from the individual imputed data sets can be combined using Rubins formula (http://sites.stat.psu.edu/~jls/mifaq.html#minf)
As such, I will award to bounty to someone who can transform the 5 separate imputed datasets from the Amelia object into 5 separate, complete, data frames.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T17:37:46+00:00

require(Amelia)
require(survey)
require(data.table)
require(plotrix)

a<-as.data.frame(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
b<-as.data.frame(c(1,2,2,1,2,1,1,2,1,2,2,1,1,2,1,2))
c<-as.data.frame(c(2,7,8,5,4,4,3,8,7,9,10,1,3,3,2,8))
d<-as.data.frame(c(3,9,7,4,5,5,2,10,8,10,12,2,4,4,3,7))
e<-as.data.frame(c(2500,8000,NA,4500,4500,NA,2500,NA,7400,9648,1112,1532,3487,3544,NA,7000))

impute<-cbind(a,b,c,d,e)
names(impute) <- c("X","site","var2","var3", "hh_inc") 

summary(impute)


m <- 5
a.output <- amelia(x = impute, m=m, autopri=0.5,cs="X",
               idvars=c("site","var2"),
               logs=c("hh_inc","var3"))

stats.out <- NULL
for(i in 1:m){
df2 <- data.table(a.output$imputations[[i]])
df3 <-  data.frame(dataset=i,df2[,list(std.error(hh_inc),mean(hh_inc)), by="site"])
stats.out <- rbind(stats.out, df3)
}
colnames(stats.out) <- c("dataset","site","stdError","mean")
stats.out

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a data set base_data which has missing values. I have therefore used

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply