dask_ml.preprocessing.LabelEncoder

`dask_ml.preprocessing`.LabelEncoder¶

class dask_ml.preprocessing.LabelEncoder(use_categorical: bool = True)¶

Encode labels with value between 0 and n_classes-1.

Note

This differs from the scikit-learn version for Categorical data. When passed a categorical y, this implementation will use the categorical information for the label encoding and transformation. You will receive different answers when

Your categories are not monotonically increasing
You have unobserved categories

Specify use_categorical=False to recover the scikit-learn behavior.

Parameters

use_categoricalbool, default True: Whether to use the categorical dtype information when y is a dask or pandas Series with a categorical dtype.

Attributes

classes_array of shape (n_class,): Holds the label for each class.
dtype_Optional CategoricalDtype: For Categorical y, the dtype is stored here.

Examples

LabelEncoder can be used to normalize labels.

>>> from dask_ml import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6]) 
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"]) 
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

When using Dask, we strongly recommend using a Categorical dask Series if possible. This avoids a (potentially expensive) scan of the values and enables a faster transform algorithm.

>>> import dask.dataframe as dd
>>> import pandas as pd
>>> data = dd.from_pandas(pd.Series(['a', 'a', 'b'], dtype='category'),
...                       npartitions=2)
>>> le.fit_transform(data)
dask.array<values, shape=(nan,), dtype=int8, chunksize=(nan,)>
>>> le.fit_transform(data).compute()
array([0, 0, 1], dtype=int8)

Methods

`fit`(y)	Fit label encoder.
`fit_transform`(y)	Fit label encoder and return encoded labels.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(y)	Transform labels back to original encoding.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(y)	Transform labels to normalized encoding.

__init__(use_categorical: bool = True)¶

dask_ml.preprocessing.OrdinalEncoder

dask_ml.preprocessing.PolynomialFeatures

dask_ml.preprocessing.LabelEncoder

dask_ml.preprocessing.LabelEncoder¶

`dask_ml.preprocessing`.LabelEncoder¶