I am using the blackboost function from the mboost package to estimate a model

Question

0

Asked: June 15, 20262026-06-15T08:58:21+00:00 2026-06-15T08:58:21+00:00

I am using the blackboost function from the mboost package to estimate a model

0

I am using the blackboost function from the mboost package to estimate a model on an approximately 500mb dataset on a Windows 7 64-bit, 8gb RAM machine. During the execution R uses up to virtually all available memory. After the calculation is done, over 4.5gb keeps allocated to R even after calling the garbage collection with gc() or saving and reloading the workspace to a new R session. Using .ls.objects (1358003) I found that the size of all visible objects is about 550mb.

The output of gc() tells me that the bulk of data is in vector cells, although I’m not sure what that means:

            used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   2856967  152.6    4418719  236.0   3933533  210.1
Vcells 526859527 4019.7  610311178 4656.4 558577920 4261.7

This is what I’m doing:

> memory.size()
[1] 1443.99
> model <- blackboost(formula, data = mydata[mydata$var == 1,c(dv,ivs)],tree_control=ctree_control(maxdepth = 4))

…a bunch of packages are loaded…

> memory.size()
[1] 4431.85
> print(object.size(model),units="Mb")
25.7 Mb
> memory.profile()
     NULL      symbol    pairlist     closure environment     promise    language 
        1       15895      826659       20395        4234       13694      248423 
  special     builtin        char     logical     integer      double     complex 
      174        1572     1197774       34286       84631       42071          28 
character         ...         any        list  expression    bytecode externalptr 
   228592           1           0       79877           1       51276        2182 
  weakref         raw          S4 
      413         417        4385

mydata[mydata$var == 1,c(dv,ivs)] has 139593 rows and 75 columns with mostly factor variables and some logical or numerical variables. formula is a formula object of the type: “dv ~ var2 + var3 + …. + var73”. dv is a variable name string and ivs is a string vector with all independent variables var2 … var74.

Why is so much memory being allocated to R? How can I make R free up the extra memory? Any thoughts appreciated!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T08:58:22+00:00

I have talked to one of the package authors, who told me that much of the data associated with the model object is saved in environments, which explains why object.size does not reflect the complete memory usage of R induced by the blackboost function. He also told me that the mboost package was not optimized in terms of speed and memory efficiency but is aimed at flexibility, and that all trees are saved and thereby the data as well, which explains the large amounts of data generated (I still find the dimensions remarkable..). He recommended using the package gbm (which I couldn’t get to replicate my results, yet) or to serialize, by doing something like this:

### first M_1 iterations
mod <- blackboost(...)[M_1]
f1 <- fitted(mod)
rm(mod)
### then M_2 additional iterations ...
mod <- blackboost(..., offset = f1)[M_2]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using the blackboost function from the mboost package to estimate a model

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply