dask_ml.model_selection.HyperbandSearchCV
dask_ml.model_selection.HyperbandSearchCV¶
- class dask_ml.model_selection.HyperbandSearchCV(estimator, parameters, max_iter=81, aggressiveness=3, patience=False, tol=0.001, test_size=None, random_state=None, scoring=None, verbose=False, prefix='', predict_meta=None, predict_proba_meta=None, transform_meta=None)¶
Find the best parameters for a particular model with an adaptive cross-validation algorithm.
Hyperband will find close to the best possible parameters with the given computational budget * by spending more time training high-performing estimators [1]. This means that Hyperband stops training estimators that perform poorly – at it’s core, Hyperband is an early stopping scheme for RandomizedSearchCV.
Hyperband does not require a trade-off between “evaluate many parameters for a short time” and “train a few parameters for a long time” like RandomizedSearchCV.
Hyperband requires one input which requires knowing how long to train the best performing estimator via
max_iter. The other implicit input (the Dask array chuck size) requires a rough estimate of how many parameters to sample. Specification details are in Notes.- *
After \(N\)
partial_fitcalls the estimator Hyperband produces will be close to the best possible estimator that \(N\)partial_fitcalls could ever produce with high probability (where “close” means “within log terms of the expected best possible score”).
- Parameters
- estimatorestimator object.
A object of that type is instantiated for each hyperparameter combination. This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a
scorefunction, orscoringmust be passed. The estimator must implementpartial_fit,set_params, and work well withclone.- parametersdict
Dictionary with parameters names (string) as keys and distributions or lists of parameters to try. Distributions must provide a
rvsmethod for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly.- max_iterint
The maximum number of partial_fit calls to any one model. This should be the number of
partial_fitcalls required for the model to converge. See Notes for details on setting this parameter.- aggressivenessint, default=3
How aggressive to be in culling off the different estimators. Higher values imply higher confidence in scoring (or that the hyperparameters influence the
estimator.scoremore than the data). Theory suggestsaggressiveness=3is close to optimal.aggressiveness=4has higher confidence that is likely suitable for initial exploration.- patienceint, default False
If specified, training stops when the score does not increase by
tolafterpatiencecalls topartial_fit. Off by default. Apatiencevalue is automatically selected ifpatience=Trueto work well with the Hyperband model selection algorithm.- tolfloat, default 0.001
The required level of improvement to consider stopping training on that model when
patienceis specified. Increasingtolwill tend to reduce training time at the cost of (potentially) worse estimators.- test_sizefloat
Fraction of the dataset to hold out for computing test/validation scores. Defaults to the size of a single partition of the input training set.
Note
The testing dataset should fit in memory on a single machine. Adjust the
test_sizeparameter as necessary to achieve this.- random_stateint, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- scoringstring, callable, list/tuple, dict or None, default: None
A single string (see The scoring parameter: defining model evaluation rules) or a callable (see scoring) to evaluate the predictions on the test set.
If None, the estimator’s default scorer (if available) is used.
- verbosebool, float, int, optional, default: False
If False (default), don’t print logs (or pipe them to stdout). However, standard logging will still be used.
If True, print logs and use standard logging.
If float, print/log approximately
verbosefraction of the time.- prefixstr, optional, default=””
While logging, add
prefixto each message.- predict_meta: pd.Series, pd.DataFrame, np.array deafult: None(infer)
An empty
pd.Series,pd.DataFrame,np.arraythat matches the output type of the estimatorspredictcall. This meta is necessary for for some estimators to work withdask.dataframeanddask.array- predict_proba_meta: pd.Series, pd.DataFrame, np.array deafult: None(infer)
An empty
pd.Series,pd.DataFrame,np.arraythat matches the output type of the estimatorspredict_probacall. This meta is necessary for for some estimators to work withdask.dataframeanddask.array- transform_meta: pd.Series, pd.DataFrame, np.array deafult: None(infer)
An empty
pd.Series,pd.DataFrame,np.arraythat matches the output type of the estimatorstransformcall. This meta is necessary for for some estimators to work withdask.dataframeanddask.array
- Attributes
- metadata and metadata_dict[str, Union(int, dict)]
These dictionaries describe the computation performed, either before computation happens with
metadataor after computation happens withmetadata_. These dictionaries both have keysn_models, an int representing how many models will be/is created.partial_fit_calls, an int representing how many timespartial_fitwill be/is called.
brackets, a list of the brackets that Hyperband runs. Each bracket has different values for training time importance and hyperparameter importance. In addition ton_modelsandpartial_fit_calls, each element in this list has keysbracket, an int the bracket ID. Each bracket corresponds to a different levels of training time importance. For bracket 0, training time is important. For the highest bracket, training time is not important and models are killed aggressively.SuccessiveHalvingSearchCV params, a dictionary used to create the different brackets. It does not include theestimatororparametersparameters.decisions, the number ofpartial_fitcalls Hyperband makes before making decisions.
These dictionaries are the same if
patienceis not specified. Ifpatienceis specified, it’s possible that less training is performed, andmetadata_will reflect that (thoughmetadatawon’t).- cv_results_Dict[str, np.ndarray]
A dictionary that describes how well each model has performed. It contains information about every model regardless if it reached
max_iter. It has keysmean_partial_fit_timemean_score_timestd_partial_fit_timestd_score_timetest_scorerank_test_scoremodel_idpartial_fit_callsparamsparam_{key}, where{key}is every key inparams.bracket
The values in the
test_scorekey correspond to the last score a model received on the hold out dataset. The keymodel_idcorresponds withhistory_. This dictionary can be imported into a Pandas DataFrame.In the
model_id, the bracket ID prefix corresponds to the bracket inmetadata. Bracket 0 doesn’t adapt to previous training at all; higher values correspond to more adaptation.- history_list of dicts
Information about each model after each
partial_fitcall. Each dict the keyspartial_fit_timescore_timescoremodel_idparamspartial_fit_callselapsed_wall_time
The key
model_idcorresponds to themodel_idincv_results_. This list of dicts can be imported into Pandas.- model_history_dict of lists of dict
A dictionary of each models history. This is a reorganization of
history_: the same information is present but organized per model.This data has the structure
{model_id: [h1, h2, h3, ...]}whereh1,h2andh3are elements ofhistory_andmodel_idis the model ID as incv_results_.- best_estimator_BaseEstimator
The model with the highest validation score as selected by the Hyperband model selection algorithm.
- best_score_float
Score achieved by
best_estimator_on the validation set after the final call topartial_fit.- best_index_int
Index indicating which estimator in
cv_results_corresponds to the highest score.- best_params_dict
Dictionary of best parameters found on the hold-out data.
- scorer_
The function used to score models, which has a call signature of
scorer_(estimator, X, y).
Notes
To set
max_iterand the chunk size forXandy, it is required to estimatethe number of examples at least one model will see (
n_examples). If 10 passes through the data are needed for the longest trained model,n_examples = 10 * len(X).how many hyper-parameter combinations to sample (
n_params)
These can be rough guesses. To determine the chunk size and
max_iter,Let the chunks size be
chunk_size = n_examples / n_paramsLet
max_iter = n_params
Then, every estimator sees no more than
max_iter * chunk_size = n_examplesexamples. Hyperband will actually sample some more hyper-parameter combinations thann_examples(which is why rough guesses are adequate). For example, let’s sayabout 200 or 300 hyper-parameters need to be tested to effectively search the possible hyper-parameters
models need more than
50 * len(X)examples but less than100 * len(X)examples.
Let’s decide to provide
81 * len(X)examples and to sample 243 parameters. Then each chunk will be 1/3rd the dataset andmax_iter=243.If you use
HyperbandSearchCV, please use the citation for [2]@InProceedings{sievert2019better, author = {Scott Sievert and Tom Augspurger and Matthew Rocklin}, title = {{B}etter and faster hyperparameter optimization with {D}ask}, booktitle = {{P}roceedings of the 18th {P}ython in {S}cience {C}onference}, pages = {118 - 125}, year = {2019}, editor = {Chris Calloway and David Lippa and Dillon Niederhut and David Shupe}, # noqa doi = {10.25080/Majora-7ddc1dd1-011} }
References
- 1
“Hyperband: A novel bandit-based approach to hyperparameter optimization”, 2016 by L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. https://arxiv.org/abs/1603.06560
- 2
“Better and faster hyperparameter optimization with Dask”, 2018 by S. Sievert, T. Augspurger, M. Rocklin. https://doi.org/10.25080/Majora-7ddc1dd1-011
Examples
>>> import numpy as np >>> from dask_ml.model_selection import HyperbandSearchCV >>> from dask_ml.datasets import make_classification >>> from sklearn.linear_model import SGDClassifier >>> >>> X, y = make_classification(chunks=20) >>> est = SGDClassifier(tol=1e-3) >>> param_dist = {'alpha': np.logspace(-4, 0, num=1000), >>> 'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge'], >>> 'average': [True, False]} >>> >>> search = HyperbandSearchCV(est, param_dist) >>> search.fit(X, y, classes=np.unique(y)) >>> search.best_params_ {'loss': 'log', 'average': False, 'alpha': 0.0080502}
Methods
decision_function(X)fit(X[, y])Find the best parameters for a particular model.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
inverse_transform(Xt)predict(X)Predict for X.
predict_log_proba(X)Log of probability estimates.
predict_proba(X)Probability estimates.
score(X[, y])Returns the score on the given data.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, compute])Configure whether metadata should be requested to be passed to the
scoremethod.transform(X)Transform block or partition-wise for dask inputs.
partial_fit
- __init__(estimator, parameters, max_iter=81, aggressiveness=3, patience=False, tol=0.001, test_size=None, random_state=None, scoring=None, verbose=False, prefix='', predict_meta=None, predict_proba_meta=None, transform_meta=None)¶