Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7525267
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T03:29:47+00:00 2026-05-30T03:29:47+00:00

I have a dataframe with the lengths and widths of various arthropods from the

  • 0

I have a dataframe with the lengths and widths of various arthropods from the guts of salamanders. Because some guts had thousands of certain prey items, I only measured a subset of each prey type. I now want to replace each unmeasured individual with the mean length and width for that prey. I want to keep the dataframe and just add imputed columns (length2, width2). The main reason is that each row also has columns with data on the date and location the salamander was collected. I could fill in the NA with a random selection of the measured individuals but for the sake of argument let’s assume I just want to replace each NA with the mean.

For example imagine I have a dataframe that looks something like:

id    taxa        length  width
101   collembola  2.1     0.9
102   mite        0.9     0.7
103   mite        1.1     0.8
104   collembola  NA      NA
105   collembola  1.5     0.5
106   mite        NA      NA

In reality I have more columns and about 25 different taxa and a total of ~30,000 prey items in total. It seems like the plyr package might be ideal for this but I just can’t figure out how to do this. I’m not very R or programming savvy but I’m trying to learn.

Not that I know what I’m doing but I’ll try to create a small dataset to play with if it helps.

exampleDF <- data.frame(id = seq(1:100), taxa = c(rep("collembola", 50), rep("mite", 25), 
rep("ant", 25)), length = c(rnorm(40, 1, 0.5), rep("NA", 10), rnorm(20, 0.8, 0.1), rep("NA", 
5), rnorm(20, 2.5, 0.5), rep("NA", 5)), width = c(rnorm(40, 0.5, 0.25), rep("NA", 10), 
rnorm(20, 0.3, 0.01), rep("NA", 5), rnorm(20, 1, 0.1), rep("NA", 5)))

Here are a few things I’ve tried (that haven’t worked):

# mean imputation to recode NA in length and width with means 
  (could do random imputation but unnecessary here)
mean.imp <- function(x) { 
  missing <- is.na(x) 
  n.missing <-sum(missing) 
  x.obs <-a[!missing] 
  imputed <- x 
  imputed[missing] <- mean(x.obs) 
  return (imputed) 
  } 

mean.imp(exampleDF[exampleDF$taxa == "collembola", "length"])

n.taxa <- length(unique(exampleDF$taxa))
for(i in 1:n.taxa) {
  mean.imp(exampleDF[exampleDF$taxa == unique(exampleDF$taxa[i]), "length"])
} # no way to get back into dataframe in proper places, try plyr? 

another attempt:

imp.mean <- function(x) {
  a <- mean(x, na.rm = TRUE)
  return (ifelse (is.na(x) == TRUE , a, x)) 
 } # tried but not sure how to use this in ddply

Diet2 <- ddply(exampleDF, .(taxa), transform, length2 = function(x) {
  a <- mean(exampleDF$length, na.rm = TRUE)
  return (ifelse (is.na(exampleDF$length) == TRUE , a, exampleDF$length)) 
  })

Any suggestions?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T03:29:49+00:00Added an answer on May 30, 2026 at 3:29 am

    Not my own technique I saw it on the boards a while back:

    dat <- read.table(text = "id    taxa        length  width
    101   collembola  2.1     0.9
    102   mite        0.9     0.7
    103   mite        1.1     0.8
    104   collembola  NA      NA
    105   collembola  1.5     0.5
    106   mite        NA      NA", header=TRUE)
    
    
    library(plyr)
    impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
    dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
         width = impute.mean(width))
    
    dat2[order(dat2$id), ] #plyr orders by group so we have to reorder
    

    Edit A non plyr approach with a for loop:

    for (i in which(sapply(dat, is.numeric))) {
        for (j in which(is.na(dat[, i]))) {
            dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i],  na.rm = TRUE)
        }
    }
    

    Edit many moons later here is a data.table & dplyr approach:

    data.table

    library(data.table)
    setDT(dat)
    
    dat[, length := impute.mean(length), by = taxa][,
        width := impute.mean(width), by = taxa]
    

    dplyr

    library(dplyr)
    
    dat %>%
        group_by(taxa) %>%
        mutate(
            length = impute.mean(length),
            width = impute.mean(width)  
        )
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a dataframe with some numeric columns. Some row has a 0 value
I have a question about saving a dataframe with unequal lengths. Is there way
I have a vector (I actually just retrieved individual columns from a dataframe) which
I have a dataframe with distances df<-data.frame(site.x=c(A,A,A,B,B,C), site.y=c(B,C,D,C,D,D),Distance=c(67,57,64,60,67,60)) I need to convert this to
I have a dataframe x with this values: x1 x2 x3 1 NA 4
I have a dataframe with over 200 columns. The issue is as they were
I have a dataframe called split2_data (actually a drop-leveled subset of a bigger data
I have the dataframe p3 below: test result 1 1 26.87778 2 1 24.52598
I have a dataframe with measurements stored as a list by row. Subject Measurements
I have a dataframe with numeric entries like this one test <- data.frame(x =

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.