Clustering
Contents
Clustering¶
|
Scalable KMeans for clustering |
|
Apply parallel Spectral Clustering |
The dask_ml.cluster
module implements several algorithms for clustering unlabeled data.
Spectral Clustering¶
Spectral Clustering finds a low-dimensional embedding on the affinity matrix between samples. The embedded dataset is then clustered, typically with KMeans.
Typically, spectral clustering algorithms do not scale well. Computing the
In dask-ml
, we use the Nyström method to approximate the large affinity
matrix. This involves sampling n_components
rows from the entire training
set. The exact affinity is computed for this subset
(
Let
Where
See the spectral clustering benchmark for an example showing how
dask_ml.cluster.SpectralClustering
scales in the number of samples.