dask_ml.model_selection
.HyperbandSearchCV¶

class
dask_ml.model_selection.
HyperbandSearchCV
(estimator, parameters, max_iter=81, aggressiveness=3, patience=False, tol=0.001, test_size=None, random_state=None, scoring=None, verbose=False, prefix='')¶ Find the best parameters for a particular model with an adaptive crossvalidation algorithm.
Hyperband will find close to the best possible parameters with the given computational budget [*] by spending more time training highperforming estimators [1]. This means that Hyperband stops training estimators that perform poorly – at it’s core, Hyperband is an early stopping scheme for RandomizedSearchCV.
Hyperband does not require a tradeoff between “evaluate many parameters for a short time” and “train a few parameters for a long time” like RandomizedSearchCV.
Hyperband requires one input which requires knowing how long to train the best performing estimator via
max_iter
. The other implicit input (the Dask array chuck size) requires a rough estimate of how many parameters to sample. Specification details are in Notes.[*] After \(N\) partial_fit
calls the estimator Hyperband produces will be close to the best possible estimator that \(N\)partial_fit
calls could ever produce with high probability (where “close” means “within log terms of the expected best possible score”).Parameters:  estimator : estimator object.
A object of that type is instantiated for each hyperparameter combination. This is assumed to implement the scikitlearn estimator interface. Either estimator needs to provide a
score
function, orscoring
must be passed. The estimator must implementpartial_fit
,set_params
, and work well withclone
. parameters : dict
Dictionary with parameters names (string) as keys and distributions or lists of parameters to try. Distributions must provide a
rvs
method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly. max_iter : int
The maximum number of partial_fit calls to any one model. This should be the number of
partial_fit
calls required for the model to converge. See Notes for details on setting this parameter. aggressiveness : int, default=3
How aggressive to be in culling off the different estimators. Higher values imply higher confidence in scoring (or that the hyperparameters influence the
estimator.score
more than the data). Theory suggestsaggressiveness=3
is close to optimal.aggressiveness=4
has higher confidence that is likely suitable for initial exploration. patience : int, default False
If specified, training stops when the score does not increase by
tol
afterpatience
calls topartial_fit
. Off by default. Apatience
value is automatically selected ifpatience=True
to work well with the Hyperband model selection algorithm. tol : float, default 0.001
The required level of improvement to consider stopping training on that model when
patience
is specified. Increasingtol
will tend to reduce training time at the cost of (potentially) worse estimators. test_size : float
Fraction of the dataset to hold out for computing test/validation scores. Defaults to the size of a single partition of the input training set.
Note
The testing dataset should fit in memory on a single machine. Adjust the
test_size
parameter as necessary to achieve this. random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 scoring : string, callable, list/tuple, dict or None, default: None
A single string (see The scoring parameter: defining model evaluation rules) or a callable (see Defining your scoring strategy from metric functions) to evaluate the predictions on the test set.
If None, the estimator’s default scorer (if available) is used.
 verbose : bool, float, int, optional, default: False
If False (default), don’t print logs (or pipe them to stdout). However, standard logging will still be used.
If True, print logs and use standard logging.
If float, print/log approximately
verbose
fraction of the time. prefix : str, optional, default=””
While logging, add
prefix
to each message.
Attributes:  metadata and metadata_ : dict[str, Union(int, dict)]
These dictionaries describe the computation performed, either before computation happens with
metadata
or after computation happens withmetadata_
. These dictionaries both have keysn_models
, an int representing how many models will be/is created.partial_fit_calls
, an int representing how many timespartial_fit
will be/is called.
brackets
, a list of the brackets that Hyperband runs. Each bracket has different values for training time importance and hyperparameter importance. In addition ton_models
andpartial_fit_calls
, each element in this list has keysbracket
, an int the bracket ID. Each bracket corresponds to a different levels of training time importance. For bracket 0, training time is important. For the highest bracket, training time is not important and models are killed aggressively.SuccessiveHalvingSearchCV params
, a dictionary used to create the different brackets. It does not include theestimator
orparameters
parameters.decisions
, the number ofpartial_fit
calls Hyperband makes before making decisions.
These dictionaries are the same if
patience
is not specified. Ifpatience
is specified, it’s possible that less training is performed, andmetadata_
will reflect that (thoughmetadata
won’t). cv_results_ : Dict[str, np.ndarray]
A dictionary that describes how well each model has performed. It contains information about every model regardless if it reached
max_iter
. It has keysmean_partial_fit_time
mean_score_time
std_partial_fit_time
std_score_time
test_score
rank_test_score
model_id
partial_fit_calls
params
param_{key}
, where{key}
is every key inparams
.bracket
The values in the
test_score
key correspond to the last score a model received on the hold out dataset. The keymodel_id
corresponds withhistory_
. This dictionary can be imported into a Pandas DataFrame.In the
model_id
, the bracket ID prefix corresponds to the bracket inmetadata
. Bracket 0 doesn’t adapt to previous training at all; higher values correspond to more adaptation. history_ : list of dicts
Information about each model after each
partial_fit
call. Each dict the keyspartial_fit_time
score_time
score
model_id
params
partial_fit_calls
elapsed_wall_time
The key
model_id
corresponds to themodel_id
incv_results_
. This list of dicts can be imported into Pandas. model_history_ : dict of lists of dict
A dictionary of each models history. This is a reorganization of
history_
: the same information is present but organized per model.This data has the structure
{model_id: [h1, h2, h3, ...]}
whereh1
,h2
andh3
are elements ofhistory_
andmodel_id
is the model ID as incv_results_
. best_estimator_ : BaseEstimator
The model with the highest validation score as selected by the Hyperband model selection algorithm.
 best_score_ : float
Score achieved by
best_estimator_
on the vaidation set after the final call topartial_fit
. best_index_ : int
Index indicating which estimator in
cv_results_
corresponds to the highest score. best_params_ : dict
Dictionary of best parameters found on the holdout data.
 scorer_ :
The function used to score models, which has a call signature of
scorer_(estimator, X, y)
.
Notes
To set
max_iter
and the chunk size forX
andy
, it is required to estimate the number of examples at least one model will see
(
n_examples
). If 10 passes through the data are needed for the longest trained model,n_examples = 10 * len(X)
.  how many hyperparameter combinations to sample (
n_params
)
These can be rough guesses. To determine the chunk size and
max_iter
, Let the chunks size be
chunk_size = n_examples / n_params
 Let
max_iter = n_params
Then, every estimator sees no more than
max_iter * chunk_size = n_examples
examples. Hyperband will actually sample some more hyperparameter combinations thann_examples
(which is why rough guesses are adequate). For example, let’s say about 200 or 300 hyperparameters need to be tested to effectively search the possible hyperparameters
 models need more than
50 * len(X)
examples but less than100 * len(X)
examples.
Let’s decide to provide
81 * len(X)
examples and to sample 243 parameters. Then each chunk will be 1/3rd the dataset andmax_iter=243
.If you use
HyperbandSearchCV
, please use the citation for [2]@InProceedings{sievert2019better, author = {Scott Sievert and Tom Augspurger and Matthew Rocklin}, title = {{B}etter and faster hyperparameter optimization with {D}ask}, booktitle = {{P}roceedings of the 18th {P}ython in {S}cience {C}onference}, pages = {118  125}, year = {2019}, editor = {Chris Calloway and David Lippa and Dillon Niederhut and David Shupe}, # noqa doi = {10.25080/Majora7ddc1dd1011} }
References
[1] “Hyperband: A novel banditbased approach to hyperparameter optimization”, 2016 by L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. https://arxiv.org/abs/1603.06560 [2] “Better and faster hyperparameter optimization with Dask”, 2018 by S. Sievert, T. Augspurger, M. Rocklin. https://doi.org/10.25080/Majora7ddc1dd1011 Examples
>>> import numpy as np >>> from dask_ml.model_selection import HyperbandSearchCV >>> from dask_ml.datasets import make_classification >>> from sklearn.linear_model import SGDClassifier >>> >>> X, y = make_classification(chunks=20) >>> est = SGDClassifier(tol=1e3) >>> param_dist = {'alpha': np.logspace(4, 0, num=1000), >>> 'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge'], >>> 'average': [True, False]} >>> >>> search = HyperbandSearchCV(est, param_dist) >>> search.fit(X, y, classes=np.unique(y)) >>> search.best_params_ {'loss': 'log', 'average': False, 'alpha': 0.0080502}
Methods
decision_function
(X)fit
(X[, y])Find the best parameters for a particular model. get_params
([deep])Get parameters for this estimator. inverse_transform
(Xt)predict
(X)Predict for X. predict_log_proba
(X)Log of proability estimates. predict_proba
(X)Probability estimates. score
(X[, y])Returns the score on the given data. set_params
(**params)Set the parameters of this estimator. transform
(X)partial_fit 
__init__
(estimator, parameters, max_iter=81, aggressiveness=3, patience=False, tol=0.001, test_size=None, random_state=None, scoring=None, verbose=False, prefix='')¶ Initialize self. See help(type(self)) for accurate signature.

fit
(X, y=None, **fit_params)¶ Find the best parameters for a particular model.
Parameters:  X, y : arraylike
 **fit_params
Additional partial fit keyword arguments for the estimator.

get_params
(deep=True)¶ Get parameters for this estimator.
Parameters:  deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

predict
(X)¶ Predict for X.
For dask inputs, a dask array or dataframe is returned. For other inputs (NumPy array, pandas dataframe, scipy sparse matrix), the regular return value is returned.
Parameters:  X : arraylike
Returns:  y : arraylike

predict_log_proba
(X)¶ Log of proability estimates.
For dask inputs, a dask array or dataframe is returned. For other inputs (NumPy array, pandas dataframe, scipy sparse matrix), the regular return value is returned.
If the underlying estimator does not have a
predict_proba
method, then anAttributeError
is raised.Parameters:  X : array or dataframe
Returns:  y : arraylike

predict_proba
(X)¶ Probability estimates.
For dask inputs, a dask array or dataframe is returned. For other inputs (NumPy array, pandas dataframe, scipy sparse matrix), the regular return value is returned.
If the underlying estimator does not have a
predict_proba
method, then anAttributeError
is raised.Parameters:  X : array or dataframe
Returns:  y : arraylike

score
(X, y=None) → float¶ Returns the score on the given data.
Parameters:  X : arraylike, shape = [n_samples, n_features]
Input data, where n_samples is the number of samples and n_features is the number of features.
 y : arraylike, shape = [n_samples] or [n_samples, n_output], optional
Target relative to X for classification or regression; None for unsupervised learning.
Returns:  score : float
return self.estimator.score(X, y)

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters:  **params : dict
Estimator parameters.
Returns:  self : object
Estimator instance.