dask_ml.preprocessing.QuantileTransformer

`dask_ml.preprocessing`.QuantileTransformer¶

class dask_ml.preprocessing.QuantileTransformer(*, n_quantiles=1000, output_distribution='uniform', ignore_implicit_zeros=False, subsample=10000, random_state=None, copy=True)¶

Transforms features using quantile information.

This implementation differs from the scikit-learn implementation by using approximate quantiles. The scikit-learn docstring follows.

This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.

The transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function. Features values of new/unseen data that fall below or above the fitted range will be mapped to the bounds of the output distribution. Note that this transform is non-linear. It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.

For example visualizations, refer to Compare QuantileTransformer with other scalers.

See also

quantile_transform: Equivalent function without the estimator API.
PowerTransformer: Perform mapping to a normal distribution using a power transform.
StandardScaler: Perform standardization that is faster, but less robust to outliers.
RobustScaler: Perform robust standardization that removes the influence of outliers but does not put outliers and inliers on the same scale.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import QuantileTransformer
>>> rng = np.random.RandomState(0)
>>> X = np.sort(rng.normal(loc=0.5, scale=0.25, size=(25, 1)), axis=0)
>>> qt = QuantileTransformer(n_quantiles=10, random_state=0)
>>> qt.fit_transform(X)
array([...])

Methods

`fit`(X[, y])	Compute the quantiles used for transforming.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_feature_names_out`([input_features])	Get output feature names for transformation.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(X)	Back-projection to the original space.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Feature-wise transformation of the data.

__init__(*, n_quantiles=1000, output_distribution='uniform', ignore_implicit_zeros=False, subsample=10000, random_state=None, copy=True)¶

dask_ml.preprocessing.MinMaxScaler

dask_ml.preprocessing.Categorizer

dask_ml.preprocessing.QuantileTransformer

dask_ml.preprocessing.QuantileTransformer¶

`dask_ml.preprocessing`.QuantileTransformer¶