dask_ml.decomposition
.TruncatedSVD¶
-
class
dask_ml.decomposition.
TruncatedSVD
(n_components=2, algorithm='tsqr', n_iter=5, random_state=None, tol=0.0, compute=True)¶ Methods
fit
(X[, y])Fit truncated SVD on training data X fit_transform
(X[, y])Fit model to X and perform dimensionality reduction on X. get_params
([deep])Get parameters for this estimator. inverse_transform
(X)Transform X back to its original space. set_params
(**params)Set the parameters of this estimator. transform
(X[, y])Perform dimensionality reduction on X. -
__init__
(n_components=2, algorithm='tsqr', n_iter=5, random_state=None, tol=0.0, compute=True)¶ Dimensionality reduction using truncated SVD (aka LSA).
This transformer performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Contrary to PCA, this estimator does not center the data before computing the singular value decomposition.
Parameters: - n_components : int, default = 2
Desired dimensionality of output data. Must be less than or equal to the number of features. The default value is useful for visualization.
- algorithm : {‘tsqr’, ‘randomized’}
SVD solver to use. Both use the tsqr (for “tall-and-skinny QR”) algorithm internally. ‘randomized’ uses an approximate algorithm that is faster, but not exact. See the References for more.
- n_iter : int, optional (default 0)
Number of power iterations, useful when the singular values decay slowly. Error decreases exponentially as n_power_iter increases. In practice, set n_power_iter <= 4.
- random_state : int, RandomState instance or None, optional
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- tol : float, optional
Ignored.
- compute : bool
Whether or not SVD results should be computed eagerly, by default True.
Attributes: - components_ : array, shape (n_components, n_features)
- explained_variance_ : array, shape (n_components,)
The variance of the training samples transformed by a projection to each component.
- explained_variance_ratio_ : array, shape (n_components,)
Percentage of variance explained by each of the selected components.
- singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the
n_components
variables in the lower-dimensional space.
Notes
SVD suffers from a problem called “sign indeterminacy”, which means the sign of the
components_
and the output from transform depend on the algorithm and random state. To work around this, fit instances of this class to data once, then keep the instance around to do transformations.Warning
The implementation currently does not support sparse matricies.
References
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. A. Benson, D. Gleich, and J. Demmel. IEEE International Conference on Big Data, 2013. http://arxiv.org/abs/1301.1071
Examples
>>> from dask_ml.decomposition import TruncatedSVD >>> import dask.array as da >>> X = da.random.normal(size=(1000, 20), chunks=(100, 20)) >>> svd = TruncatedSVD(n_components=5, n_iter=3, random_state=42) >>> svd.fit(X) # doctest: +NORMALIZE_WHITESPACE TruncatedSVD(algorithm='tsqr', n_components=5, n_iter=3, random_state=42, tol=0.0)
>>> print(svd.explained_variance_ratio_) # doctest: +ELLIPSIS [0.06386323 0.06176776 0.05901293 0.0576399 0.05726607] >>> print(svd.explained_variance_ratio_.sum()) # doctest: +ELLIPSIS 0.299... >>> print(svd.singular_values_) # doctest: +ELLIPSIS array([35.92469517, 35.32922121, 34.53368856, 34.138..., 34.013...])
Note that
tranform
returns adask.Array
.>>> svd.transform(X) dask.array<sum-agg, shape=(1000, 5), dtype=float64, chunksize=(100, 5)>
-
fit
(X, y=None)¶ Fit truncated SVD on training data X
Parameters: - X : array-like, shape (n_samples, n_features)
Training data.
- y : Ignored
Returns: - self : object
Returns the transformer object.
-
fit_transform
(X, y=None)¶ Fit model to X and perform dimensionality reduction on X.
Parameters: - X : array-like, shape (n_samples, n_features)
Training data.
- y : Ignored
Returns: - X_new : array, shape (n_samples, n_components)
Reduced version of X. This will always be a dense array, of the same type as the input array. If
X
was adask.array
, thenX_new
will be adask.array
with the same chunks along the first dimension.
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
inverse_transform
(X)¶ Transform X back to its original space.
Returns an array X_original whose transform would be X.
Parameters: - X : array-like, shape (n_samples, n_components)
New data.
Returns: - X_original : array, shape (n_samples, n_features)
Note that this is always a dense array.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : object
Estimator instance.
-
transform
(X, y=None)¶ Perform dimensionality reduction on X.
Parameters: - X : array-like, shape (n_samples, n_features)
Data to be transformed.
- y : Ignored
Returns: - X_new : array, shape (n_samples, n_components)
Reduced version of X. This will always be a dense array, of the same type as the input array. If
X
was adask.array
, thenX_new
will be adask.array
with the same chunks along the first dimension.
-