Sorry if this question is somewhat specific to the python Scikit-learn library. I am

Question

0

Asked: June 14, 20262026-06-14T00:40:06+00:00 2026-06-14T00:40:06+00:00

Sorry if this question is somewhat specific to the python Scikit-learn library. I am

0

Sorry if this question is somewhat specific to the python Scikit-learn library.

I am trying to perform a grid search to find optimal parameter to scikit-learn’s GradientBoostingRegressor. The problem is, I don’t know where to start. In the past I have used R and RStudio setup but I am currenlty trying to migrate to Python for Data Mining and Scikit seems very promising.

Can anyone share possibly some simple setup code they may have used to compute on Amazon EC2 cluster or possibly point to useful example reference for that library for other machine learning algorithm?

Thank you.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T00:40:07+00:00

As far as I know, GBRT is a pretty sequential algorithm hence there is no trivial way to run it in parallel.

Random forests / ExtraTrees models are embarrassingly parallel, hence would be better candidate for training models on a cluster.

scikit-learn has some builtin support for single machine multiprocessing using joblib (check the docstring of models that accept an n_jobs argument). We plan to implement a task dispatch framework in joblib at some point instead. Thus we could for instance leverage IPython parallel as a backend to run on a cluster. However there is nothing ready out of the box for this currently.

If you are ready to invest some time doing it yourself I would advise you to have a look at StarCluster and its IPython plugin:

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Sorry if this question is somewhat specific to the python Scikit-learn library. I am

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply