Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6374503
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T01:29:47+00:00 2026-05-25T01:29:47+00:00

I created several ctree models (about 40 to 80) which I want evaluate rather

  • 0

I created several ctree models (about 40 to 80) which I want evaluate rather often.

An issue is that the model objects are very big (40 models require more than 2.8G of memory) and it appears to me, that they stored the training data, maybe as modelname@data and modelname@responses, and not just the informations relevant to predict new data.

Most other R learning packages have configurable options whether to include the data in the model object, but I couldn’t find any hints in the documentation. I also tried to assign empty ModelEnv objects by

modelname@data <- new("ModelEnv")

but there was no effect on the size of the respective RData file.

Anyone knows whether ctree really stores the training data and how to remove all data from ctree models that are irrelevant for new predictions so that I can fit many of them in memory?

Thanks a lot,

Stefan


Thank you for your feedback, that was already very helpful.

I used dput and str to take a deeper look at the object and found that no training data is included in the model, but there is a responses slot, which seems to have the training labels and rownames. Anyways, I noticed that each node has a weight vector for each training sample. After a while of inspecting the code, I ended up googling a bit and found the following comment in the party NEWS log:

         CHANGES IN party VERSION 0.9-13 (2007-07-23)

o   update `mvt.f'

o   improve the memory footprint of RandomForest objects
    substancially (by removing the weights slots from each node).

It turns out, there is a C function in the party package to remove these weights called R_remove_weights with the following definition:

SEXP R_remove_weights(SEXP subtree, SEXP removestats) {
    C_remove_weights(subtree, LOGICAL(removestats)[0]);
    return(R_NilValue);
}

It also works fine:

# cc is my model object

sum(unlist(lapply(slotNames(cc), function (x)  object.size(slot(cc, x)))))
# returns: [1] 2521256
save(cc, file="cc_before.RData")

.Call("R_remove_weights", cc@tree, TRUE, PACKAGE="party")
# returns NULL and removes weights and node statistics

sum(unlist(lapply(slotNames(cc), function (x)  object.size(slot(cc, x)))))
# returns: [1] 1521392
save(cc, file="cc_after.RData")

As you can see, it reduces the object size substantially, from roughly 2.5MB to 1.5MB.

What is strange, though, is that the corresponding RData files are insanely huge, and there is no impact on them:

$ ls -lh cc*
-rw-r--r-- 1 user user 9.6M Aug 24 15:44 cc_after.RData
-rw-r--r-- 1 user user 9.6M Aug 24 15:43 cc_before.RData

Unzipping the file shows the 2.5MB object to occupy nearly 100MB of space:

$ cp cc_before.RData cc_before.gz
$ gunzip cc_before.gz 
$ ls -lh cc_before*
-rw-r--r-- 1 user user  98M Aug 24 15:45 cc_before

Any ideas, what could cause this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T01:29:48+00:00Added an answer on May 25, 2026 at 1:29 am

    I found a solution to the problem at hand, so I write this answer if anyone might run into the same issue. I’ll describe my process, so it might be a bit rambling, so bear with me.

    With no clue, I thought about nuking slots and removing weights to get the objects as small as possible and at least save some memory, in case no fix will be found. So I removed @data and @responses as a start and prediction went still fine without them, yet no effect on the .RData file size.

    I the went the other way round and created and empty ctree model, just pluging the tree into it:

    > library(party)
    
    ## create reference predictions for the dataset
    > predictions.org <- treeresponse(c1, d)
    
    ## save tree object for reference
    save(c1, "testSize_c1.RData")
    

    Checking the size of the original object:

    $ ls -lh testSize_c1.RData 
    -rw-r--r-- 1 user user 9.6M 2011-08-25 14:35 testSize_c1.RData
    

    Now, let’s create an empty CTree and copy the tree only:

    ## extract the tree only 
    > c1Tree <- c1@tree
    
    ## create empty tree and plug in the extracted one 
    > newCTree <- new("BinaryTree")
    > newCTree@tree <- c1Tree
    
    ## save tree for reference 
    save(newCTree, file="testSize_newCTree.RData")
    

    This new tree object is now much smaller:

    $ ls -lh testSize_newCTree.RData 
    -rw-r--r-- 1 user user 108K 2011-08-25 14:35 testSize_newCTree.RData
    

    However, it can’t be used to predict:

    ## predict with the new tree
    > predictions.new <- treeresponse(newCTree, d)
    Error in object@cond_distr_response(newdata = newdata, ...) : 
      unused argument(s) (newdata = newdata)
    

    We did not set the @cond_distr_response, which might cause the error, so copy the original one as well and try to predict again:

    ## extract cond_distr_response from original tree
    > cdr <- c1@cond_distr_response
    > newCTree@cond_distr_response <- cdr
    
    ## save tree for reference 
    save(newCTree, file="testSize_newCTree_with_cdr.RData")
    
    ## predict with the new tree
    > predictions.new <- treeresponse(newCTree, d)
    
    ## check correctness
    > identical(predictions.org, predictions.new)
    [1] TRUE
    

    This works perfectly, but now the size of the RData file is back at its original value:

    $ ls -lh testSize_newCTree_with_cdr.RData 
    -rw-r--r-- 1 user user 9.6M 2011-08-25 14:37 testSize_newCTree_with_cdr.RData
    

    Simply printing the slot, shows it to be a function bound to an environment:

    > c1@cond_distr_response
    function (newdata = NULL, mincriterion = 0, ...) 
    {
        wh <- RET@get_where(newdata = newdata, mincriterion = mincriterion)
        response <- object@responses
        if (any(response@is_censored)) {
            swh <- sort(unique(wh))
            RET <- vector(mode = "list", length = length(wh))
            resp <- response@variables[[1]]
            for (i in 1:length(swh)) {
                w <- weights * (where == swh[i])
                RET[wh == swh[i]] <- list(mysurvfit(resp, weights = w))
            }
            return(RET)
        }
        RET <- .Call("R_getpredictions", tree, wh, PACKAGE = "party")
        return(RET)
    }
    <environment: 0x44e8090>
    

    So the answer to the initial question appears to be that the methods of the object bind an environment to it, which is then saved with the object in the corresponding RData file. This might also explain why several packages are loaded when the RData file is read.

    Thus, to get rid of the environment, we can’t copy the methods, but we can’t predict without them either. The rather “dirty” solution is to emulate the functionality of the original methods and call the underlying C code directly. After some digging through the source code, this is indeed possible. As the code copied above suggests, we need to call get_where, which determines the terminal node of the tree reached by the input. We then need to call R_getpredictions to determine the response from that terminal node for each input sample. The tricky part is that we need to get the data in the right input format and thus have to call the data preprocessing included in ctree:

    ## create a character string of the formula which was used to fit the free
    ## (there might be a more neat way to do this)
    > library(stringr)
    > org.formula <- str_c(
                       do.call(str_c, as.list(deparse(c1@data@formula$response[[2]]))),
                       "~", 
                       do.call(str_c, as.list(deparse(c1@data@formula$input[[2]]))))
    
    ## call the internal ctree preprocessing 
    > data.dpp <- party:::ctreedpp(as.formula(org.formula), d)
    
    ## create the data object necessary for the ctree C code
    > data.ivf <- party:::initVariableFrame.df(data.dpp@menv@get("input"), 
                                               trafo = ptrafo)
    
    ## now call the tree traversal routine, note that it only requires the tree
    ## extracted from the @tree slot, not the whole object
    > nodeID <- .Call("R_get_nodeID", c1Tree, data.ivf, 0, PACKAGE = "party")
    
    ## now determine the respective responses
    > predictions.syn <- .Call("R_getpredictions", c1Tree, nodeID, PACKAGE = "party")
    
    ## check correctness
    > identical(predictions.org, predictions.syn)
    [1] TRUE
    

    We now only need to save the extracted tree and the formula string to be able to predict new data:

    > save(c1Tree, org.formula, file="testSize_extractedObjects.RData")
    

    We can further remove the unnecessary weights as described in the updated question above:

    > .Call("R_remove_weights", c1Tree, TRUE, PACKAGE="party")
    > save(c1Tree, org.formula, file="testSize_extractedObjects__removedWeights.RData")
    

    Now let’s have a look at the file sizes again:

    $ ls -lh testSize_extractedObjects*
    -rw-r--r-- 1 user user 109K 2011-08-25 15:31 testSize_extractedObjects.RData
    -rw-r--r-- 1 user user  43K 2011-08-25 15:31 testSize_extractedObjects__removedWeights.RData
    

    Finally, instead of (compressed) 9.6M, only 43K are required to use the model. I should now be able to fit as many as I want in my 3G heap space. Hooray!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have created a Python module that creates and populates several SQLite tables. Now,
I've created an Entity Data Model and imported several stored procedures as Function Imports.
I have a class that creates several IDisposable objects, all of these objects are
A radiobuttongroup was created and several radiomenuelems were created in that group. Clicking them
I'm creating an API for a module and after I created several methods inside
My application performs time consuming work independently on several files. I created a BackgroundWorker
I need to create several applications that all share a Microsoft SQL Server database.
I'd like to create several modules that will be used in nearly all scripts
In one of my VB6 forms, I create several other Form objects and store
I've found several online tools that allow me to see the effect of a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.