dask_ml.preprocessing
.BlockTransformer¶
-
class
dask_ml.preprocessing.
BlockTransformer
(func: Callable[..., Union[ArrayLike, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame]], *, validate: bool = False, **kw_args)¶ Construct a transformer from a an arbitrary callable
The BlockTransformer forwards the blocks of the X arguments to a user-defined callable and returns the result of this operation. This is useful for stateless operations, that can be performed on the cell or block level, such as taking the log of frequencies. In general the transformer is not suitable for e.g. standardization tasks as this requires information for a complete column.
Parameters: - func : callable
The callable to use for the transformation.
- validate : bool, optional default=False
Indicate that the input X array should be checked before calling
func
.- kw_args : dict, optional
Dictionary of additional keyword arguments to pass to func.
Examples
>>> import dask.datasets >>> import pandas as pd >>> from dask_ml.preprocessing import BlockTransformer >>> df = dask.datasets.timeseries() >>> df ... # doctest: +SKIP Dask DataFrame Structure: id name x y npartitions=30 2000-01-01 int64 object float64 float64 2000-01-02 ... ... ... ... ... ... ... ... ... 2000-01-30 ... ... ... ... 2000-01-31 ... ... ... ... Dask Name: make-timeseries, 30 tasks >>> trn = BlockTransformer(pd.util.hash_pandas_object, index=False) >>> trn.transform(df) ... # doctest: +ELLIPSIS Dask Series Structure: npartitions=30 2000-01-01 uint64 2000-01-02 ... ... 2000-01-30 ... 2000-01-31 ... dtype: uint64 Dask Name: hash_pandas_object, 60 tasks
Methods
fit_transform
(X[, y])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. fit transform -
__init__
(func: Callable[..., Union[ArrayLike, pandas.core.frame.DataFrame, dask.dataframe.core.DataFrame]], *, validate: bool = False, **kw_args)¶ Initialize self. See help(type(self)) for accurate signature.
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- y : ndarray of shape (n_samples,), default=None
Target values.
- **fit_params : dict
Additional fit parameters.
Returns: - X_new : ndarray array of shape (n_samples, n_features_new)
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : object
Estimator instance.