Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8756117
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T13:59:42+00:00 2026-06-13T13:59:42+00:00

Can anyone recommend a strategy for making predictions using a gradient boosting model in

  • 0

Can anyone recommend a strategy for making predictions using a gradient boosting model in the <10-15ms range (the faster the better)?

I have been using R‘s gbm package, but the first prediction takes ~50ms (subsequent vectorized predictions average to 1ms, so there appears to be overhead, perhaps in the call to the C++ library). As a guideline, there will be ~10-50 inputs and ~50-500 trees. The task is classification and I need access to predicted probabilities.

I know there are a lot of libraries out there, but I’ve had little luck finding information even on rough prediction times for them. The training will happen offline, so only predictions need to be fast — also, predictions may come from a piece of code / library that is completely separate from whatever does the training (as long as there is a common format for representing the trees).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T13:59:43+00:00Added an answer on June 13, 2026 at 1:59 pm

    I’m the author of the scikit-learn gradient boosting module, a Gradient Boosted Regression Trees implementation in Python. I put some effort in optimizing prediction time since the method was targeted at low-latency environments (in particular ranking problems); the prediction routine is written in C, still there is some overhead due to Python function calls. Having said that: prediction time for single data points with ~50 features and about 250 trees should be << 1ms.

    In my use-cases prediction time is often governed by the cost of feature extraction. I strongly recommend profiling to pin-point the source of the overhead (if you use Python, I can recommend line_profiler).

    If the source of the overhead is prediction rather than feature extraction you might check whether its possible to do batch predictions instead of predicting single data points thus limiting the overhead due to the Python function call (e.g. in ranking you often need to score the top-K documents, so you can do the feature extraction first and then run predict on the K x n_features matrix.

    If this doesn’t help either you should try the limit the number of trees because the runtime cost for prediction is basically linear in the number of trees.
    There are a number of ways to limit the number of trees without affecting the model accuracy:

    1. Proper tuning of the learning rate; the smaller the learning rate, the more trees are needed and thus the slower is prediction.

    2. Post-process GBM with L1 regularization (Lasso); See Elements of Statistical Learning Section 16.3.1 – use predictions of each tree as new features and run the representation through a L1 regularized linear model – remove those trees that don’t get any weight.

    3. Fully-corrective weight updates; instead of doing the line-search/weight update just for the most recent tree, update all trees (see [Warmuth2006] and [Johnson2012]). Better convergence – fewer trees.

    If none of the above does the trick you could investigate cascades or early-exit strategies (see [Chen2012])

    References:

    [Warmuth2006] M. Warmuth, J. Liao, and G. Ratsch. Totally corrective boosting algorithms that maximize the margin. In Proceedings of the 23rd international conference on Machine learning, 2006.

    [Johnson2012] Rie Johnson, Tong Zhang, Learning Nonlinear Functions Using Regularized Greedy Forest, arxiv, 2012.

    [Chen2012] Minmin Chen, Zhixiang Xu, Kilian Weinberger, Olivier Chapelle, Dor Kedem, Classifier Cascade for Minimizing Feature Evaluation Cost, JMLR W&CP 22: 218-226, 2012.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Can anyone recommend a Prolog compiler for .Net. The ones that I have been
Can anyone recommend a strategy for aggregating raw 'click' and 'impression' data stored in
Can anyone recommend an xml rpc library or client for wordpress using java? I've
Can anyone recommend a tutorial or point me in the direction of some easily
Can anyone recommend a good cheat sheet for gbd? I'm experienced with windbg commands,
Can anyone recommend a good Java game engine for developing simple tile-based games? I'm
Can anyone recommend a based open source mailing list software ? The following would
Can anyone recommend a good hosting provider where we can host our Maven 2
Can anyone recommend a decent Excel (Binary XLS) document importer written in C? I
Can anyone recommend tools or articles that help me to learn how to TDD

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.