dask_ml.ensemble.BlockwiseVotingRegressor

class dask_ml.ensemble.BlockwiseVotingRegressor(estimator)

Blockwise training and ensemble voting regressor.

This regressor trains on blocks / partitions of Dask Arrays or DataFrames. A cloned version of estimator will be fit independently on each block or partition of the Dask collection.

Prediction is done by the ensemble of learned models.

Warning

Ensure that your data are sufficiently shuffled prior to training! If the values of the various blocks / partitions of your dataset are not distributed similarly, the regressor will give poor results.

Parameters:
estimator : Estimator
Attributes:
estimators_ : list of regressors

The collection of fitted sub-estimators that are estimator fitted on each partition / block of the inputs.

Examples

>>> import dask_ml.datasets
>>> import dask_ml.ensemble
>>> import sklearn.linear_model
>>> X, y = dask_ml.datasets.make_regression(n_samples=100_000,
...                                         chunks=10_000)
>>> subestimator = sklearn.linear_model.LinearRegression()
>>> clf = dask_ml.ensemble.BlockwiseVotingRegressor(
...     subestimator,
... )
>>> clf.fit(X, y)

Methods

get_params([deep]) Get parameters for this estimator.
score(X, y[, sample_weight]) Return the mean accuracy on the given test data and labels.
set_params(**params) Set the parameters of this estimator.
fit  
predict  
__init__(estimator)

Initialize self. See help(type(self)) for accurate signature.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

This matches the scikit-learn implementation with the differences that

  • dask_ml.metrics.accuracy_score() is used rather than sklearn.metrics.accuracy_score().
  • The 'uniform_average' method is used for multioutput results rather than 'variance_weighted'.
Parameters:
X : array-like of shape (n_samples, n_features)

Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : object

Estimator instance.