Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7822995
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T08:05:42+00:00 2026-06-02T08:05:42+00:00

I want to do stepwise regression using AIC on a list of linear models.

  • 0

I want to do stepwise regression using AIC on a list of linear models. idea is to use e a list of linear models and then apply stepAIC on each list element. It fails.

I tried to track the problem down. I think I found the problem. However, I don’t understand the cause. Try the code to see the difference between three cases:

require(MASS)
n<-30 
x1<-rnorm(n, mean=0, sd=1) #create rv x1 
x2<-rnorm(n, mean=1, sd=1)
x3<-rnorm(n, mean=2, sd=1)
epsilon<-rnorm(n,mean=0,sd=1) # random error variable 
dat<-as.data.frame(cbind(x1,x2,x3,epsilon)) # combine to a data frame
dat$id<-c(rep(1,10),rep(2,10),rep(3,10)) 
# y is combination from all three x and a random uniform variable
dat$y<-x1+x2+x3+epsilon 
# apply lm() only resulting in a list of models
dat.lin.model.lst<-lapply(split(dat,dat$id),function(d) lm(y~x1+x2+x3,data=d)) 
stepAIC(dat.lin.model.lst[[1]]) # FAIL!!!
# apply function stepAIC(lm())-  works
dat.lin.model.stepAIC.lst<-lapply(split(dat,dat$id),function(d) stepAIC(lm(y~x1+x2+x3,data=d))) 
# create model for particular group with id==1
k<-which(dat$id==1) # manually select records with id==1
lin.model.id1<-lm(dat$y[k]~dat$x1[k]+dat$x2[k]+dat$x3[k]) 
stepAIC(lin.model.id1) # check stepAIC - works!

I am pretty sure that stepAIC() needs the original data from data.frame "dat". That is what I was thinking of before. (Hope I am right on that)
But there is no parameter in stepAIC() where I can pass the original data frame. Obviously, for plain models not wrapped in a list it’s enough to pass the model. (last three lines in code) So I am wondering:

  • Q1: How does stepAIC knows where to find the original data "dat" (not only the model data which is passed as parameter)?
  • Q2: How can I possibly know that there is another parameter in stepAIC() which is not explicitly stated in the help pages? (maybe my English is just too bad to find)
  • Q3: How can I pass that parameter to stepAIC()?

It must be somewhere in the environment of the apply function and passing on the data. Either lm() or stepAIC() and the pointer/link to the raw data must get lost somewhere. I have not a good understanding what an environment in R does. For me it was kind of isolating local from global variables. But maybe its more complicated. Anyone who can explain that to me in regard to the problem above? Honestly, I dont read much out of the R documentation. Any better understanding would help me.

OLD:
I have data in a dataframe df that can be split into several subgroups. For that purpose I created a groupID called df$id. lm() returns the coefficent as expected for the first subgroup. I want to do a stepwise regression using AIC as criterion for each subgroup separately. I use lmList {lme4} which results in a model for each subgroup (id). But if I use stepAIC{MASS} for the list elements it throws an error. see below.

So the question is: What mistake is in my procedure/syntax? I get results for single models but not the ones created with lmList. Does lmList() store different information on the model than lm() does?
But in the help it states:
class "lmList": A list of objects of class lm with a common model.

>lme4.list.lm<-lmList(formula=Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px |df$id,data = df)
>lme4.list.lm[[1]]
Call: lm(formula = formula, data = data)
Coefficients:
(Intercept)          Gap.um     Standoff.um  Voidflaeche.px  
  62.306133       -0.009878        0.026317       -0.015048  

>stepAIC(lme4.list.lm[[1]], direction="backward") 
#stepAIC on first element on the list of linear models
Start:  AIC=295.12
Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px
                 Df Sum of Sq    RSS    AIC
- Standoff.um     1      2.81 7187.3 293.14
- Gap.um          1     29.55 7214.0 293.37
<none>                        7184.4 295.12
- Voidflaeche.px  1    604.38 7788.8 297.97  

Error in terms.formula(formula, data = data) : 
'data' argument is of the wrong type

Obviously something does not work with the list. But I have not an idea what it might be.
Since I tried to do the same with the base package which creates the same model (at least the same coefficients). Results are below:

>lin.model<-lm(Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px,df[which(df$id==1),]) 
# id is in order, so should be the same subgroup as for the first list element in lmList

Coefficients:  
(Intercept)    Gap.um  Standoff.um  Voidflaeche.px  
  62.306133 -0.009878     0.026317       -0.015048  

Well, this is what I get returned using stepAIC on my linear.model .
As far as I know the akaike information criterion can be used to estimate which model better balances between fit and generalization given some data.

>stepAIC(lin.model,direction="backward")
Start:  AIC=295.12
Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px
                 Df Sum of Sq    RSS    AIC
- Standoff.um     1      2.81 7187.3 293.14  
- Gap.um          1     29.55 7214.0 293.37
<none>                        7184.4 295.12
- Voidflaeche.px  1    604.38 7788.8 297.97  

Step:  AIC=293.14
Scherkraft.N ~ Gap.um + Voidflaeche.px
                 Df Sum of Sq    RSS    AIC
- Gap.um          1     28.51 7215.8 291.38
 <none>                        7187.3 293.14
- Voidflaeche.px  1    717.63 7904.9 296.85

Step:  AIC=291.38
Scherkraft.N ~ Voidflaeche.px
                 Df Sum of Sq    RSS    AIC
<none>                        7215.8 291.38
- Voidflaeche.px  1    795.46 8011.2 295.65
Call: lm(formula = Scherkraft.N ~ Voidflaeche.px, data = df[which(df$id == 1), ])

Coefficients:
(Intercept)  Voidflaeche.px  
   71.7183         -0.0151  

I read from the output I should use the model: Scherkraft.N ~ Voidflaeche.px because this is the minimal AIC. Well, it would be nice if someone could shortly describe the output. My understanding of the stepwise regression (assuming backwards elimination) is all regressors are included in the initial model. Then the least important one is eliminated. The criterion to decide is the AIC. and so forth… Somehow I have problems to get the tables interpreted right. It would be nice if someone could confirm my interpretation. The "-"(minus) stands for the eliminated regressor. On top is the "start" model and in the table table below the RSS and AIC are calculated for possible eliminations. So the first row in the first table says a model Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px – Standoff.um would result in an AIC 293.14. Choose the one without Standoff.um: Scherkraft.N~Gap.um+Voidflaeche.px

EDIT:
I replaced the lmList{lme4} with dlply() to create the list of models.
Still stepAIC is not coping with the list. It throws another error. Actually, I believe it is a problem with the data stepAIC needs to run through. I was wondering how it calculates the AIC-value for each step just from the model data. I would take the original data to construct the models leaving one regressor out each time. Thereof I would calculate the AIC and compare. So how stepAIC is working if it has not access to the original data. (I cant see a parameter where I pass the original data to stepAIC). Still, I have no clue why it works with a plain model but not with the model wrapped in a list.

>model.list.all <- dlply(df, .id, function(x) 
  {return(lm(Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px,data=x)) })
>stepAIC(model.list.all[[1]])
Start:  AIC=295.12
Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px
                 Df Sum of Sq    RSS    AIC
- Standoff.um     1      2.81 7187.3 293.14
- Gap.um          1     29.55 7214.0 293.37
<none>                        7184.4 295.12
- Voidflaeche.px  1    604.38 7788.8 297.97
Error in is.data.frame(data) : object 'x' not found
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T08:05:43+00:00Added an answer on June 2, 2026 at 8:05 am

    I’m not sure what may have changed in the versioning to make the debugging so difficult, but one solution would be to use do.call, which evaluates the expressions in the call before executing it. This means that instead of storing just d in the call, so that update and stepAIC need to go find d in order to do their work, it stores a full representation of the data frame itself.

    That is, do

    do.call("lm", list(y~x1+x2+x3, data=d))
    

    instead of

    lm(y~x1+x2+x3, data=d)
    

    You can see what it’s trying to do by looking at the call element of the model, perhaps like this:

    dat.lin.model.lst <- lapply(split(dat, dat$id), function(d)
                                do.call("lm", list(y~x1+x2+x3, data=d)) )
    dat.lin.model.lst[[1]]$call
    

    It’s also possible to make your list of data frames in the global environment and then construct the call so that update and stepAIC look for each data frame in turn, because their environment chains always lead back to the global environment; like this:

    dats <- split(dat, dat$id)
    dat.lin.model.list <- lapply(seq_along(dats), function(d)
                do.call("lm", list(y~x1+x2+x3, data=call("[[", quote(dats),i))) )
    

    To see what’s changed, run dat.lin.model.lst[[1]]$call again.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to perform a stepwise linear Regression using p-values as a selection criterion,
want to know why String behaves like value type while using ==. String s1
Want the function to sort the table by HP but if duplicate HPs then
Want to attach an Event Handler/Listener to some links using the delegate() method, but
Want to populate dynamically combobox-listbox-drop-down using javascript. var table = document.createElement(table); var select =
Want to use signal when follow() function called in UserProfile. I've written signal which
Want to detect the '\n' without going through each and every character of a
want to drag element to specific div and if not dragged to this div
Want to update page after jeditable (using gem on_the_spot) has been updated. Below works
want to access select list where id starts with sentidd $('.select[id^=sentidd]').change(function() {} It fails.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.