dask_ml.preprocessing.LabelEncoder
dask_ml.preprocessing
.LabelEncoder¶
- class dask_ml.preprocessing.LabelEncoder(use_categorical: bool = True)¶
Encode labels with value between 0 and n_classes-1.
Note
This differs from the scikit-learn version for Categorical data. When passed a categorical y, this implementation will use the categorical information for the label encoding and transformation. You will receive different answers when
Your categories are not monotonically increasing
You have unobserved categories
Specify
use_categorical=False
to recover the scikit-learn behavior.- Parameters
- use_categoricalbool, default True
Whether to use the categorical dtype information when y is a dask or pandas Series with a categorical dtype.
- Attributes
- classes_array of shape (n_class,)
Holds the label for each class.
- dtype_Optional CategoricalDtype
For Categorical y, the dtype is stored here.
Examples
LabelEncoder can be used to normalize labels.
>>> from dask_ml import preprocessing >>> le = preprocessing.LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]...) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.
>>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...) >>> list(le.inverse_transform([2, 2, 1])) ['tokyo', 'tokyo', 'paris']
When using Dask, we strongly recommend using a Categorical dask Series if possible. This avoids a (potentially expensive) scan of the values and enables a faster transform algorithm.
>>> import dask.dataframe as dd >>> import pandas as pd >>> data = dd.from_pandas(pd.Series(['a', 'a', 'b'], dtype='category'), ... npartitions=2) >>> le.fit_transform(data) dask.array<values, shape=(nan,), dtype=int8, chunksize=(nan,)> >>> le.fit_transform(data).compute() array([0, 0, 1], dtype=int8)
Methods
fit
(y)Fit label encoder.
fit_transform
(y)Fit label encoder and return encoded labels.
get_metadata_routing
()Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
inverse_transform
(y)Transform labels back to original encoding.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(y)Transform labels to normalized encoding.