class dask_ml.model_selection.HyperbandSearchCV(estimator, parameters, max_iter=81, aggressiveness=3, patience=False, tol=0.001, test_size=None, random_state=None, scoring=None, verbose=False, prefix='')

Find the best parameters for a particular model with an adaptive cross-validation algorithm.

Hyperband will find close to the best possible parameters with the given computational budget [*] by spending more time training high-performing estimators [1]. This means that Hyperband stops training estimators that perform poorly – at it’s core, Hyperband is an early stopping scheme for RandomizedSearchCV.

Hyperband does not require a trade-off between “evaluate many parameters for a short time” and “train a few parameters for a long time” like RandomizedSearchCV.

Hyperband requires one input which requires knowing how long to train the best performing estimator via max_iter. The other implicit input (the Dask array chuck size) requires a rough estimate of how many parameters to sample. Specification details are in Notes.

 [*] After $$N$$ partial_fit calls the estimator Hyperband produces will be close to the best possible estimator that $$N$$ partial_fit calls could ever produce with high probability (where “close” means “within log terms of the expected best possible score”).

Notes

To set max_iter and the chunk size for X and y, it is required to estimate

• the number of examples at least one model will see (n_examples). If 10 passes through the data are needed for the longest trained model, n_examples = 10 * len(X).
• how many hyper-parameter combinations to sample (n_params)

These can be rough guesses. To determine the chunk size and max_iter,

1. Let the chunks size be chunk_size = n_examples / n_params
2. Let max_iter = n_params

Then, every estimator sees no more than max_iter * chunk_size = n_examples examples. Hyperband will actually sample some more hyper-parameter combinations than n_examples (which is why rough guesses are adequate). For example, let’s say

• about 200 or 300 hyper-parameters need to be tested to effectively search the possible hyper-parameters
• models need more than 50 * len(X) examples but less than 100 * len(X) examples.

Let’s decide to provide 81 * len(X) examples and to sample 243 parameters. Then each chunk will be 1/3rd the dataset and max_iter=243.

Examples

>>> import numpy as np
>>> from dask_ml.model_selection import HyperbandSearchCV
>>> from dask_ml.datasets import make_classification
>>> from sklearn.linear_model import SGDClassifier
>>>
>>> X, y = make_classification(chunks=20)
>>> est = SGDClassifier(tol=1e-3)
>>> param_dist = {'alpha': np.logspace(-4, 0, num=1000),
>>>               'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge'],
>>>               'average': [True, False]}
>>>
>>> search = HyperbandSearchCV(est, param_dist)
>>> search.fit(X, y, classes=np.unique(y))
>>> search.best_params_
{'loss': 'log', 'average': False, 'alpha': 0.0080502}

Methods

 partial_fit
__init__(estimator, parameters, max_iter=81, aggressiveness=3, patience=False, tol=0.001, test_size=None, random_state=None, scoring=None, verbose=False, prefix='')

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None, **fit_params)

Find the best parameters for a particular model.

Parameters: X, y : array-like **fit_params Additional partial fit keyword arguments for the estimator.
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. params : dict Parameter names mapped to their values.
predict(X)

Predict for X.

For dask inputs, a dask array or dataframe is returned. For other inputs (NumPy array, pandas dataframe, scipy sparse matrix), the regular return value is returned.

Parameters: X : array-like y : array-like
predict_log_proba(X)

Log of probability estimates.

For dask inputs, a dask array or dataframe is returned. For other inputs (NumPy array, pandas dataframe, scipy sparse matrix), the regular return value is returned.

If the underlying estimator does not have a predict_proba method, then an AttributeError is raised.

Parameters: X : array or dataframe y : array-like
predict_proba(X)

Probability estimates.

For dask inputs, a dask array or dataframe is returned. For other inputs (NumPy array, pandas dataframe, scipy sparse matrix), the regular return value is returned.

If the underlying estimator does not have a predict_proba method, then an AttributeError is raised.

Parameters: X : array or dataframe y : array-like
score(X, y=None) → float

Returns the score on the given data.

Parameters: X : array-like, shape = [n_samples, n_features] Input data, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_output], optional Target relative to X for classification or regression; None for unsupervised learning. score : float return self.estimator.score(X, y)
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params : dict Estimator parameters. self : estimator instance Estimator instance.