dask_ml.cluster.SpectralClustering
dask_ml.cluster.SpectralClustering¶
- class dask_ml.cluster.SpectralClustering(n_clusters=8, eigen_solver=None, random_state=None, n_init='auto', gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None, n_jobs=1, n_components=100, persist_embedding=False, kmeans_params=None)¶
- Apply parallel Spectral Clustering - This implementation avoids the expensive computation of the N x N affinity matrix. Instead, the Nyström Method is used as an approximation. - Parameters
- n_clustersinteger, optional
- The dimension of the projection subspace. 
- eigen_solverNone
- ignored 
- random_stateint, RandomState instance or None, optional, default: None
- A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver == ‘amg’ and by the K-Means initialization. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. 
- n_initint, optional, default: 10
- ignored 
- gammafloat, default=1.0
- Kernel coefficient for rbf, poly, sigmoid, laplacian and chi2 kernels. Ignored for - affinity='nearest_neighbors'.
- affinitystring, array-like or callable, default ‘rbf’
- It may be ‘precomputed’ or one of the kernels supported by metrics.pairwise.PAIRWISE_KERNEL_FUNCTIONS. - Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm. - Callables should expect arguments similar to sklearn.metrics.pairwise_kernels: a required - X, an optional- Y, and- gamma,- degree,- coef0, and any keywords passed in- kernel_params.
- n_neighborsinteger
- Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for - affinity='rbf'.
- eigen_tolfloat, optional, default: 0.0
- Stopping criterion for eigendecomposition of the Laplacian matrix when using arpack eigen_solver. 
- assign_labels‘kmeans’ or Estimator, default: ‘kmeans’
- The strategy to use to assign labels in the embedding space. By default creates an instance of - dask_ml.cluster.KMeansand sets n_clusters to 2. For further control over the hyperparameters of the final label assignment, pass an instance of a- KMeansestimator (either scikit-learn or dask-ml).
- degreefloat, default=3
- Degree of the polynomial kernel. Ignored by other kernels. 
- coef0float, default=1
- Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels. 
- kernel_paramsdictionary of string to any, optional
- Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels. 
- n_jobsint, optional (default = 1)
- The number of parallel jobs to run. If - -1, then the number of jobs is set to the number of CPU cores.
- n_componentsint, default 100
- Number of rows from - Xto use for the Nyström approximation. Larger- n_componentswill improve the accuracy of the approximation, at the cost of a longer training time.
- persist_embeddingbool
- Whether to persist the intermediate n_samples x n_components array used for clustering. 
- kmeans_paramsdictionary of string to any, optional
- Keyword arguments for the KMeans clustering used for the final clustering. 
 
- Attributes
- assign_labels_Estimator
- The instance of the KMeans estimator used to assign labels 
- labels_dask.array.Array, size (n_samples,)
- The cluster labels assigned 
- eigenvalues_numpy.ndarray
- The eigenvalues from the SVD of the sampled points 
 
 - Notes - Using - persist_embedding=Truecan be an important optimization to avoid some redundant computations. This persists the array being fed to the clustering algorithm in (distributed) memory. The array is shape- n_samples x n_components.- References - Parallel Spectral Clustering in Distributed Systems, 2010 Chen, Song, Bai, Lin, and Chang IEEE Transactions on Pattern Analysis and Machine Intelligence http://ieeexplore.ieee.org/document/5444877/ 
- Spectral Grouping Using the Nystrom Method (2004) Fowlkes, Belongie, Chung, Malik IEEE Transactions on Pattern Analysis and Machine Intelligence https://people.cs.umass.edu/~mahadeva/cs791bb/reading/fowlkes-nystrom.pdf 
 - Methods - fit_predict(X[, y])- Perform clustering on X and returns cluster labels. - get_metadata_routing()- Get metadata routing of this object. - get_params([deep])- Get parameters for this estimator. - set_params(**params)- Set the parameters of this estimator. - fit - __init__(n_clusters=8, eigen_solver=None, random_state=None, n_init='auto', gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None, n_jobs=1, n_components=100, persist_embedding=False, kmeans_params=None)¶