class dask_ml.model_selection.IncrementalSearchCV(estimator, param_distribution, n_initial_parameters=10, decay_rate=1.0, test_size=None, patience=False, tol=0.001, scores_per_fit=1, max_iter=100, random_state=None, scoring=None)

Incrementally search for hyper-parameters on models that support partial_fit


This class depends on the optional distributed library.

This incremental hyper-parameter optimization class starts training the model on many hyper-parameters on a small amount of data, and then only continues training those models that seem to be performing well.

The number of actively trained hyper-parameter combinations decays with an inverse decay given by the initial number of parameters and the decay rate:

n_models = n_initial_parameters * (n_batches ** -decay_rate)

See the User Guide for more.

estimator : estimator object.

A object of that type is instantiated for each initial hyperparameter combination. This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score` function, or scoring must be passed. The estimator must implement partial_fit, set_params, and work well with clone.

param_distributions : dict

Dictionary with parameters names (string) as keys and distributions or lists of parameters to try. Distributions must provide a rvs method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly.

n_initial_parameters : int, default=10

Number of parameter settings that are sampled. This trades off runtime vs quality of the solution.

Alternatively, you can set this to "grid" to do a full grid search.

decay_rate : float, default 1.0

How quickly to decrease the number partial future fit calls. Higher decay_rate will result in lower training times, at the cost of worse models.

patience : int, default False

Maximum number of non-improving scores before we stop training a model. Off by default.

scores_per_fit : int, default 1

If patience is used the maximum number of partial_fit calls between score calls.

tol : float, default 0.001

The required level of improvement to consider stopping training on that model. The most recent score must be at at most tol better than the all of the previous patience scores for that model. Increasing tol will tend to reduce training time, at the cost of worse models.

max_iter : int, default 100

Maximum number of partial fit calls per model.

test_size : float

Fraction of the dataset to hold out for computing test scores. Defaults to the size of a single partition of the input training set


The training dataset should fit in memory on a single machine. Adjust the test_size parameter as necessary to achieve this.

random_state : int, RandomState instance or None, optional, default: None

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

scoring : string, callable, list/tuple, dict or None, default: None

A single string (see The scoring parameter: defining model evaluation rules) or a callable (see Defining your scoring strategy from metric functions) to evaluate the predictions on the test set.

For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.

NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.

See Specifying multiple metrics for evaluation for an example.

If None, the estimator’s default scorer (if available) is used.

cv_results_ : dict of np.ndarrays

This dictionary has keys

  • mean_partial_fit_time
  • mean_score_time
  • std_partial_fit_time
  • std_score_time
  • test_score
  • rank_test_score
  • model_id
  • partial_fit_calls
  • params
  • param_{key}, where key is every key in params.

The values in the test_score key correspond to the last score a model received on the hold out dataset. The key model_id corresponds with history_. This dictionary can be imported into Pandas.

model_history_ : dict of lists of dict

A dictionary of each models history. This is a reorganization of history_: the same information is present but organized per model.

This data has the structure {model_id: hist} where hist is a subset of history_ and model_id are model identifiers.

history_ : list of dicts

Information about each model after each partial_fit call. Each dict the keys

  • partial_fit_time
  • score_time
  • score
  • model_id
  • params
  • partial_fit_calls

The key model_id corresponds to the model_id in cv_results_. This list of dicts can be imported into Pandas.

best_estimator_ : BaseEstimator

The model with the highest validation score among all the models retained by the “inverse decay” algorithm.

best_score_ : float

Score achieved by best_estimator_ on the vaidation set after the final call to partial_fit.

best_index_ : int

Index indicating which estimator in cv_results_ corresponds to the highest score.

best_params_ : dict

Dictionary of best parameters found on the hold-out data.

scorer_ :

The function used to score models, which has a call signature of scorer_(estimator, X, y).

n_splits_ : int

Number of cross validation splits.

multimetric_ : bool

Whether this cross validation search uses multiple metrics.


Connect to the client and create the data

>>> from dask.distributed import Client
>>> client = Client()
>>> import numpy as np
>>> from dask_ml.datasets import make_classification
>>> X, y = make_classification(n_samples=5000000, n_features=20,
...                            chunks=100000, random_state=0)

Our underlying estimator is an SGDClassifier. We specify a few parameters common to each clone of the estimator.

>>> from sklearn.linear_model import SGDClassifier
>>> model = SGDClassifier(tol=1e-3, penalty='elasticnet', random_state=0)

The distribution of parameters we’ll sample from.

>>> params = {'alpha': np.logspace(-2, 1, num=1000),
...           'l1_ratio': np.linspace(0, 1, num=1000),
...           'average': [True, False]}
>>> search = IncrementalSearchCV(model, params, random_state=0)
>>> search.fit(X, y, classes=[0, 1])

Alternatively you can provide keywords to start with more hyper-parameters, but stop those that don’t seem to improve with more data.

>>> search = IncrementalSearchCV(model, params, random_state=0,
...                              n_initial_parameters=1000,
...                              patience=20, max_iter=100)

Often, additional training leads to little or no gain in scores at the end of training. In these cases, stopping training is beneficial because there’s no gain from more training and less computation is required. Two parameters control detecting “little or no gain”: patience and tol. Training continues if at least one score is more than tol above the other scores in the most recent patience calls to model.partial_fit.

For example, setting tol=0 and patience=2 means training will stop after two consecutive calls to model.partial_fit without improvement, or when max_iter total calls to model.parital_fit are reached.


fit(X, y, **fit_params) Find the best parameters for a particular model.
get_params([deep]) Get parameters for this estimator.
predict(X[, y])
set_params(**params) Set the parameters of this estimator.
__init__(estimator, param_distribution, n_initial_parameters=10, decay_rate=1.0, test_size=None, patience=False, tol=0.001, scores_per_fit=1, max_iter=100, random_state=None, scoring=None)

Initialize self. See help(type(self)) for accurate signature.

fit(X, y, **fit_params)

Find the best parameters for a particular model.

X, y : array-like

Additional partial fit keyword arguments for the estimator.


Get parameters for this estimator.

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

params : mapping of string to any

Parameter names mapped to their values.


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.