Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8219981
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T13:21:47+00:00 2026-06-07T13:21:47+00:00

Does plyr skip missing levels of a factor [that is the grouping variable]? It’s

  • 0

Does plyr skip missing levels of a factor [that is the grouping variable]? It’s the first of my questions in diagnosing a problem.

I have a dataset where patients are in strata=rural or strata=city. I want to compare the age in treatment=A to treatment=B.

For example, I am trying to do:

ddply(data.c, .(strata), function(x) t.test(age~treatment, data=x, na.rm=TRUE)

But it tells me

Error in t.test.formula(age ~ treatment, data = x, na.rm = TRUE) :
grouping factor must have exactly 2 levels

If I run factor(data$strata) and factor(data$treatment) I see only two levels, respectively (they are two labels, respectively; that’s not a problem, right?).

Does plyr think that NA is a level in the grouping factor? What might be the problem for the error message?

I’ve been Googling and looking on stackoverflow to answer the question. I’m pretty new to R but I could not find an answer. Any help is much appreciated.


So I have some example code. Chase’s example worked nicely but when I use my data, I still get the same error.

dlply(data.c, .(strata), function(x) t.test(age~treatment, data=x, na.rm=TRUE produced the same error.

I don’t understand why this is.

Below is my data.c

        strata  age treatment
1   1   57.8630136986301    p_3
2   0   52.958904109589 p_3
3   NA  97.2438356164384    p12
4   0   88.2027397260274    p12
5   1   77.5890410958904    p12
6   1   43.9123287671233    p12
7   1   63.5260273972603    p_3
8   1   42.1890410958904    p12
9   0   52.1753424657534    p12
10  1   65.6493150684932    p12
11  0   44.9835616438356    p12
12  1   64.8849315068493    p12
13  1   57.5835616438356    p12
14  0   47.3013698630137    p12
15  0   74.0356164383562    NA
16  0   65.4986301369863    p12
17  1   83.986301369863 p12
18  0   47.4904109589041    p12
19  1   47.8630136986301    p12
20  1   58.8520547945205    p12
21  1   61.3342465753425    p12
22  1   66.841095890411 p12
23  1   55.6383561643836    p12
24  1   52.7178082191781    p12
25  1   71.4630136986301    p12
26  1   NA  p12
27  1   59.2082191780822    p12
28  1   69.8575342465753    p12
29  1   46.7397260273973    p12
30  1   53.5013698630137    p_3
31  1   41.3205479452055    p12
32  0   51.3917808219178    p_3
33  1   47.8684931506849    p12
34  1   87.654794520548 p12
35  0   75.558904109589 p12
36  1   71.2520547945205    p12
37  1   44.9808219178082    p_3
38  1   52  p_3
39  1   54.7643835616438    p_3
40  1   85.6630136986301    p_3
41  1   74.1835616438356    p_3
42  1   56.8684931506849    p_3
43  1   87.572602739726 p_3
44  1   85.0109589041096    p_3
45  1   70.0767123287671    p_3
46  0   47.2328767123288    p12
47  1   63.7972602739726    p12
48  1   85.8054794520548    p12
49  1   67.027397260274 p12
50  1   60.7342465753425    p12
51  0   61.9479452054794    p_3
52  0   86.8712328767123    p12
53  1   87.8219178082192    p12
54  1   49.9424657534247    p12
55  0   83.386301369863 p12
56  1   88.3013698630137    p12
57  1   55.7890410958904    p12
58  1   63.7616438356164    p12
59  1   55.5041095890411    p12
60  1   43.5232876712329    p12
61  1   58.8246575342466    p12
62  0   46.7397260273973    p12
63  0   74.2027397260274    p_3
64  0   51.9205479452055    p_3
65  0   78.1890410958904    p_3
66  0   78.9917808219178    p_3
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T13:21:49+00:00Added an answer on June 7, 2026 at 1:21 pm

    plyr does treat missing as a separate group for the purposes of grouping variables. Therefore, in your example, there are three groups: 0, 1, and NA. You can see which of the groups is throwing an error by adding the .inform argument:

    dlply(data.c, 
          .(strata), 
          function(x) t.test(age~treatment, data=x, na.rm=TRUE), 
          .inform=TRUE)
    

    which reports:

    Error in t.test.formula(age ~ treatment, data = x, na.rm = TRUE) : 
      grouping factor must have exactly 2 levels
    Error: with piece 3: 
      strata      age treatment
    3     NA 97.24384       p12
    

    And in fact there is just one row with strata of NA, and doing a t.test on a single data point (or rather, where all the data points are in the same group) will give the error you see:

    > t.test(age~treatment, data=data.c[is.na(data.c$strata),], na.rm=TRUE)
    Error in t.test.formula(age ~ treatment, data = data.c[is.na(data.c$strata),  : 
      grouping factor must have exactly 2 levels
    

    To get around this problem, you can either subset data.c in the dlply call:

    dlply(data.c[!is.na(data.c$strata),],
          .(strata), 
          function(x) t.test(age~treatment, data=x, na.rm=TRUE))
    

    (which assumes that you know where the problem will be to work around it) and which gives:

    $`0`
    
            Welch Two Sample t-test
    
    data:  age by treatment 
    t = 0.2653, df = 15.729, p-value = 0.7942
    alternative hypothesis: true difference in means is not equal to 0 
    95 percent confidence interval:
     -13.41565  17.24792 
    sample estimates:
    mean in group p_3 mean in group p12 
             64.22896          62.31283 
    
    
    $`1`
    
            Welch Two Sample t-test
    
    data:  age by treatment 
    t = 0.6606, df = 18.612, p-value = 0.517
    alternative hypothesis: true difference in means is not equal to 0 
    95 percent confidence interval:
     -6.998471 13.440397 
    sample estimates:
    mean in group p_3 mean in group p12 
             65.50091          62.27995 
    
    
    attr(,"split_type")
    [1] "data.frame"
    attr(,"split_labels")
      strata
    1      0
    2      1
    

    or you can protect the call with error catching. plyr provides a convenience wrapper try_default for such a purpose.

    dlply(data.c, 
          .(strata), 
          function(x) try_default(t.test(age~treatment, data=x, na.rm=TRUE),
                                  NULL))
    

    which gives

    Error in t.test.formula(age ~ treatment, data = x, na.rm = TRUE) : 
      grouping factor must have exactly 2 levels
    $`0`
    
            Welch Two Sample t-test
    
    data:  age by treatment 
    t = 0.2653, df = 15.729, p-value = 0.7942
    alternative hypothesis: true difference in means is not equal to 0 
    95 percent confidence interval:
     -13.41565  17.24792 
    sample estimates:
    mean in group p_3 mean in group p12 
             64.22896          62.31283 
    
    
    $`1`
    
            Welch Two Sample t-test
    
    data:  age by treatment 
    t = 0.6606, df = 18.612, p-value = 0.517
    alternative hypothesis: true difference in means is not equal to 0 
    95 percent confidence interval:
     -6.998471 13.440397 
    sample estimates:
    mean in group p_3 mean in group p12 
             65.50091          62.27995 
    
    
    $`NA`
    NULL
    
    attr(,"split_type")
    [1] "data.frame"
    attr(,"split_labels")
      strata
    1      0
    2      1
    3     NA
    

    Note that there are 3 elements to this output including one for NA which is set to NULL due to the try_default.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Does anyone have any recommendations of tools that can be of assistance with moving
Does anyone use have a good regex library that they like to use? Most
Does Blend have anything similar to the MsBuild BuildingInsideVisualStudio variable, which is set by
Does anyone have a code example that follows best practices of using MongoDB Official
Does COUNT(*) have any significant impact for MySQL performance if query already has GROUP
Does anyone have a translate function for x/y positions after rotation in javascript? for
Does anyone know why UsernameExists wont return True. I must have my syntax messed
Does any one have a solution to make the SpecFlow autocomplete in Visual Studio
Does anyone know of a Mootools script that provides nested sortable but also works
Does Java have some analog of Oracle's function MONTHS_BETWEEN ?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.