dask_ml.wrappers.Incremental
dask_ml.wrappers.Incremental¶
- class dask_ml.wrappers.Incremental(estimator=None, scoring=None, shuffle_blocks=True, random_state=None, assume_equal_chunks=True, predict_meta=None, predict_proba_meta=None, transform_meta=None)¶
 Metaestimator for feeding Dask Arrays to an estimator blockwise.
This wrapper provides a bridge between Dask objects and estimators implementing the
partial_fitAPI. These incremental learners can train on batches of data. This fits well with Dask’s blocked data structures.Note
This meta-estimator is not appropriate for hyperparameter optimization on larger-than-memory datasets. For that, see
IncrementalSearchCVorHyperbandSearchCV.See the list of incremental learners in the scikit-learn documentation for a list of estimators that implement the
partial_fitAPI. Note that Incremental is not limited to just these classes, it will work on any estimator implementingpartial_fit, including those defined outside of scikit-learn itself.Calling
Incremental.fit()with a Dask Array will pass each block of the Dask array or arrays toestimator.partial_fitsequentially.Like
ParallelPostFit, the methods available after fitting (e.g.Incremental.predict(), etc.) are all parallel and delayed.The
estimator_attribute is a clone of estimator that was actually used during the call tofit. All attributes learned during training are available onIncrementaldirectly.- Parameters
 - estimatorEstimator
 Any object supporting the scikit-learn
partial_fitAPI.- scoringstring or callable, optional
 A single string (see The scoring parameter: defining model evaluation rules) or a callable (see scoring) to evaluate the predictions on the test set.
For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.
NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.
See Specifying multiple metrics for evaluation for an example.
Warning
If None, the estimator’s default scorer (if available) is used. Most scikit-learn estimators will convert large Dask arrays to a single NumPy array, which may exhaust the memory of your worker. You probably want to always specify scoring.
- random_stateint or numpy.random.RandomState, optional
 Random object that determines how to shuffle blocks.
- shuffle_blocksbool, default True
 Determines whether to call
partial_fiton a randomly selected chunk of the Dask arrays (default), or to fit in sequential order. This does not control shuffle between blocks or shuffling each block.- predict_meta: pd.Series, pd.DataFrame, np.array deafult: None(infer)
 An empty
pd.Series,pd.DataFrame,np.arraythat matches the output type of the estimatorspredictcall. This meta is necessary for for some estimators to work withdask.dataframeanddask.array- predict_proba_meta: pd.Series, pd.DataFrame, np.array deafult: None(infer)
 An empty
pd.Series,pd.DataFrame,np.arraythat matches the output type of the estimatorspredict_probacall. This meta is necessary for for some estimators to work withdask.dataframeanddask.array- transform_meta: pd.Series, pd.DataFrame, np.array deafult: None(infer)
 An empty
pd.Series,pd.DataFrame,np.arraythat matches the output type of the estimatorstransformcall. This meta is necessary for for some estimators to work withdask.dataframeanddask.array
- Attributes
 - estimator_Estimator
 A clone of estimator that was actually fit during the
.fitcall.
Examples
>>> from dask_ml.wrappers import Incremental >>> from dask_ml.datasets import make_classification >>> import sklearn.linear_model >>> X, y = make_classification(chunks=25) >>> est = sklearn.linear_model.SGDClassifier() >>> clf = Incremental(est, scoring='accuracy') >>> clf.fit(X, y, classes=[0, 1])
When used inside a grid search, prefix the underlying estimator’s parameter names with
estimator__.>>> from sklearn.model_selection import GridSearchCV >>> param_grid = {"estimator__alpha": [0.1, 1.0, 10.0]} >>> gs = GridSearchCV(clf, param_grid) >>> gs.fit(X, y, classes=[0, 1])
Methods
fit(X[, y])Fit the underlying estimator.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
partial_fit(X[, y])Fit the underlying estimator.
predict(X)Predict for X.
predict_log_proba(X)Log of probability estimates.
predict_proba(X)Probability estimates.
score(X, y[, compute])Returns the score on the given data.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, compute])Configure whether metadata should be requested to be passed to the
scoremethod.transform(X)Transform block or partition-wise for dask inputs.
- __init__(estimator=None, scoring=None, shuffle_blocks=True, random_state=None, assume_equal_chunks=True, predict_meta=None, predict_proba_meta=None, transform_meta=None)¶