Changelog

Version 2024.4.1

Version 2023.3.24

  • Compatibility with Python 3.10

  • Dropped support for Python 3.7

  • Compatibility with scikit-learn 1.2.0 and newer

Version 2022.5.27

  • Compatibility with scikit-learn 1.1 and newer (GH#910)

Version 2021.11.30

  • Fixed regression in meta inference for wrappers when the base estimator returned a scipy.sparse matrix (GH#889)

Version 2021.11.16

  • Meta-estimators like wrappers.ParallelPostFit now work with cuDF and CuPy objects. (GH#862)

  • Fixed incompatibility with new Dask optimizations in wrappers.ParallelPostFit (GH#878)

Version 2021.10.17

  • Added support for scikit-learn 1.0.0. scikit-learn 1.0.0 is now the minimum-supported version.

Version 1.9.0

  • LogisticRegression.predict_proba now correctly returns an (n, 2) array for binary classification (GH#760)

  • Fixed multioutput behavior to be consistent with scikit-learn (GH#820)

  • Added MAPE to regression metrics (GH#822)

  • NumPy 1.20 compatability (GH#784)

Version 1.8.0

  • Compatibility with scikit-learn 0.24

Version 1.7.0

Version 1.6.0

Version 1.5.0

Version 1.4.0

Version 1.3.0

Version 1.2.0

  • Changed the name of the second positional argument in model_selection.IncrementalSearchCV from param_distribution to parameters to match the name of the base class.

  • Compatibility with scikit-learn 0.22.1.

  • Added dask_ml.preprocessing.BlockTransfomer an extension of scikit-learn’s FunctionTransformer (GH#366).

  • Added dask_ml.feature_extraction.FeatureHasher which is similar to scikit-learn’s implementation.

Version 1.1.1

  • Fixed an issue with the 1.1.0 wheel (GH#575)

  • Make svd_flip work even when arrays are read only (GH#592)

Version 1.1.0

Version 1.0.0

Version 0.13.0

Note

dask-ml 0.13.0 will be the last release to support Python 2.

Version 0.12.0

API Breaking Changes

Version 0.11.0

Note that this version of Dask-ML requires scikit-learn >= 0.20.0.

Enhancements

Bug Fixes

Version 0.10.0

Version 0.9.0

Bug Fixes

Documentation Updates

Build Changes

We’re now using Numba for performance-sensitive parts of Dask-ML. Dask-ML is now a pure-python project, so we can provide universal wheels.

Version 0.8.0

Enhancements

  • Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (GH#200)

  • Added the dask_ml.metrics.log_loss() loss function and neg_log_loss scorer (GH#318)

  • Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (GH#320)

Bug Fixes

  • Fixed dtype in LabelEncoder.fit_transform() to be integer, rather than the dtype of the classes for dask arrays (GH#311)

Version 0.7.0

Enhancements

API Breaking Changes

  • Removed the basis_inds_ attribute from dask_ml.cluster.SpectralClustering as its no longer used (GH#152)

  • Change dask_ml.wrappers.Incremental.fit() to clone the underlying estimator before training (GH#258). This induces a few changes

    1. The underlying estimator no longer gives access to learned attributes like coef_. We recommend using Incremental.coef_.

    2. State no longer leaks between successive fit calls. Note that Incremental.partial_fit() is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you’re making multiple passes over the training data.

  • Changed get_params and set_params for dask_ml.wrappers.Incremental to no longer magically get / set parameters for the underlying estimator (GH#258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:

    inc.set_params('estimator__alpha': 10)
    

Reorganization

Dask-SearchCV is now being developed in the dask/dask-ml repository. Users who previously installed dask-searchcv should now just install dask-ml.

Bug Fixes

  • Fixed random seed generation on 32-bit platforms (GH#230)

Version 0.6.0

API Breaking Changes

Enhancements

Version 0.5.0

API Breaking Changes

Bug Fixes

  • dask_ml.preprocessing.StandardScalar now works on DataFrame inputs (GH#157).

Version 0.4.1

This release added several new estimators.

Enhancements

Added dask_ml.preprocessing.RobustScaler

Scale features using statistics that are robust to outliers. This mirrors sklearn.preprocessing.RobustScalar (GH#62).

Added dask_ml.preprocessing.OrdinalEncoder

Encodes categorical features as ordinal, in one ordered feature (GH#119).

Added dask_ml.wrappers.ParallelPostFit

A meta-estimator for fitting with any scikit-learn estimator, but post-processing (predict, transform, etc.) in parallel on dask arrays. See Parallel Meta-estimators for more (GH#132).

Version 0.4.0

API Changes

  • Changed the arguments of the dask-glm based estimators in dask_glm.linear_model to match scikit-learn’s API (GH#94).

    • To specify lambuh use C = 1.0 / lambduh (the default of 1.0 is unchanged)

    • The rho, over_relax, abstol and reltol arguments have been removed. Provide them in solver_kwargs instead.

    This affects the LinearRegression, LogisticRegression and PoissonRegression estimators.

Enhancements

  • Accept dask.dataframe for dask-glm based estimators (GH#84).

Version 0.3.2

Enhancements

  • Added dask_ml.preprocessing.TruncatedSVD() and dask_ml.preprocessing.PCA() (GH#78)

Version 0.3.0

Enhancements

  • Added KMeans.predict() (GH#83)

API Changes

  • Changed the fitted attributes on MinMaxScaler and StandardScaler to be concrete NumPy or pandas objects, rather than persisted dask objects (GH#75).