Pipelines and Composite Estimators
Contents
Pipelines and Composite Estimators¶
Dask-ML estimators follow the scikit-learn API. This means Dask-ML estimators like
dask_ml.decomposition.PCA
can be placed inside a regular sklearn.pipeline.Pipeline
.
See http://scikit-learn.org/dev/modules/compose.html for more on using pipelines in general.
In [1]: from sklearn.pipeline import Pipeline # regular scikit-learn pipeline
In [2]: from dask_ml.cluster import KMeans
In [3]: from dask_ml.decomposition import PCA
In [4]: estimators = [('reduce_dim', PCA()), ('cluster', KMeans())]
In [5]: pipe = Pipeline(estimators)
In [6]: pipe
Out[6]: Pipeline(steps=[('reduce_dim', PCA()), ('cluster', KMeans())])
The pipeline pipe
can now be used with Dask arrays.
ColumnTransformer for Heterogeneous Data¶
dask_ml.compose.ColumnTransformer
is a clone of the scikit-learn version that works well
with Dask objects.
See http://scikit-learn.org/dev/modules/compose.html#columntransformer-for-heterogeneous-data for an
introduction to ColumnTransformer
.