Changelog

Changelog¶

Version 2025.1.0¶

Compatibility with dask>=2025.1.0 (GH#1008)
Compatibility with scikit-learn>=1.6.1 (GH#1008)

Version 2024.4.1¶

Fixed packaging GH#986

Version 2023.3.24¶

Compatibility with Python 3.10
Dropped support for Python 3.7
Compatibility with scikit-learn 1.2.0 and newer

Version 2022.5.27¶

Compatibility with scikit-learn 1.1 and newer (GH#910)

Version 2021.11.30¶

Fixed regression in meta inference for wrappers when the base estimator returned a scipy.sparse matrix (GH#889)

Version 2021.11.16¶

Meta-estimators like wrappers.ParallelPostFit now work with cuDF and CuPy objects. (GH#862)
Fixed incompatibility with new Dask optimizations in wrappers.ParallelPostFit (GH#878)

Version 2021.10.17¶

Added support for scikit-learn 1.0.0. scikit-learn 1.0.0 is now the minimum-supported version.

Version 1.9.0¶

LogisticRegression.predict_proba now correctly returns an (n, 2) array for binary classification (GH#760)
Fixed multioutput behavior to be consistent with scikit-learn (GH#820)
Added MAPE to regression metrics (GH#822)
NumPy 1.20 compatability (GH#784)

Version 1.8.0¶

Compatibility with scikit-learn 0.24

Version 1.7.0¶

Improved documentation for working with PyTorch models, see pytorch (GH#699)
Improved documentation for working with Keras / TensorFlow models, see Keras and Tensorflow (GH#713)
Fixed handling of remote vocabularies in dask_ml.feature_extraction.text.HashingVectorizer (GH#719)
Added dask_ml.metrics.regression.mean_squared_log_error() (GH#725)
Allow user-provided categories in dask_ml.preprocessing.OneHotEncoder (GH#727)
Added dask_ml.linear_model.LogisticRegression.decision_function() (GH#728)
Added compute argument to dask_ml.decomposition.TruncatedSVD (GH#743)
Fixed sign stability in incremental PCA (GH#742)

Version 1.6.0¶

Improved documentation for RandomizedSearchCV
Improved logging in dask_ml.cluster.KMeans (GH#688)
Added support for dask.dataframe objects in dask_ml.model_selection.HyperbandSearchCV (GH#701)
Added squared=True option to dask_ml.metrics.mean_squared_error (GH#707)
Added dask_ml.feature_extraction.text.CountVectorizer (GH#705)

Version 1.5.0¶

Support for Python 3.8 (GH#669)
Compatibility with Scikit-Learn 0.23.0 (GH#669)
Scikit-Learn 0.23.0 or newer is now required (GH#669)
Removed previously deprecated Partial classes. Use dask_ml.wrappers.Incremental instead (GH#674)

Version 1.4.0¶

Added dask_ml.decomposition.IncrementalPCA for out-of-core / distributed incremental PCA (GH#619)
Improved logging and monitoring in incremental model selection (GH#528)
Added dask_ml.ensemble.BlockwiseVotingClassifier and dask_ml.ensemble.BlockwiseVotingRegressor for blockwise training and ensemble prediction (GH#657)
Improved documentation for Hyper Parameter Search (GH#432)

Version 1.3.0¶

Added shuffle support to dask_ml.model_selection.train_test_split() for DataFrame input (GH#625)
Improved performance of dask_ml.model_selection.GridSearchCV by re-using cached tasks (GH#622)
Add support for DataFrame to dask_ml.model_selection.GridSearchCV (GH#612)
Fixed dask_ml.linear_model.LinearRegression.score() to use r2_score rather than mse (GH#614)
Handle missing data in dask_ml.preprocessing.StandardScaler (GH#608)

Version 1.2.0¶

Changed the name of the second positional argument in model_selection.IncrementalSearchCV from param_distribution to parameters to match the name of the base class.
Compatibility with scikit-learn 0.22.1.
Added dask_ml.preprocessing.BlockTransfomer an extension of scikit-learn’s FunctionTransformer (GH#366).
Added dask_ml.feature_extraction.FeatureHasher which is similar to scikit-learn’s implementation.

Version 1.1.1¶

Fixed an issue with the 1.1.0 wheel (GH#575)
Make svd_flip work even when arrays are read only (GH#592)

Version 1.1.0¶

Non-arrays (e.g. Dask Bags and DataFrames) are now allowed in dask_ml.wrappers.Incremental. This is useful for text classification pipelines (pr:570)
The index is now preserved in dask_ml.preprocessing.PolynomialFeatures for DataFrame inputs (GH#563)
dask_ml.decomposition.PCA now works with DataFrame inputs (GH#543)
dask_ml.cluster.KMeans handles inputs where some blocks are length-0 (GH#559)
Improved error reporting for mixed inputs to dask_ml.model_selection.train_test_split() (GH#552)
Removed deprecated dask_ml.joblib module. Use joblib.parallel_backend instead (GH#545)
dask_ml.preprocessing.QuantileTransformer now handles DataFrame input (GH#533)

Version 1.0.0¶

Added new hyperparameter search meta-estimators for hyperparameter search on distributed datasets: HyperbandSearchCV and SuccessiveHalvingSearchCV
Dropped Python 2 support (GH#500)

Version 0.13.0¶

Compatibility with scikit-learn 0.21.1
Cross-validation results in GridSearchCV and RandomizedSearchCV are now gathered as completed, in case a worker is lost (GH#433)
Fixed bug in dask_ml.model_selection.train_test_split() when only one of train / test size is provided (GH#502)
Consistent random state for dask_ml.model_selection.IncrementalSearchCV
Fixed various issues with 32-bit Windows builds (GH#487)

Note

dask-ml 0.13.0 will be the last release to support Python 2.

Version 0.12.0¶

API Breaking Changes¶

dask_ml.model_selection.IncrementalSearchCV now returns Dask objects for post-fit methods like .predict, etc (GH#423).

Version 0.11.0¶

Note that this version of Dask-ML requires scikit-learn >= 0.20.0.

Enhancements¶

Added dask_ml.model_selection.IncrementalSearchCV, a meta-estimator for hyperparameter optimization on larger-than-memory datasets (GH#356). See Incremental Hyperparameter Optimization for more.
Added dask_ml.preprocessing.PolynomialTransformer, a drop-in replacement for the scikit-learn version (GH#347).
Added auto-rechunking to Dask Arrays with more than one block along the features in dask_ml.model_selection.ParallelPostFit (GH#376)
Added support for Dask DataFrame inputs to dask_ml.cluster.KMeans (GH#390)
Added a compute keyword to dask_ml.wrappers.ParallelPostFit.score() to support lazily evaluating a model’s score (GH#402)

Bug Fixes¶

Changed dask_ml.wrappers.ParallelPostFit to automatically rechunk input arrays to methods like predict when they have more than one block along the features (GH#376).
Bug in dask_ml.impute.SimpleImputer with Dask DataFrames filling the count of the most frequent item, rather than the item itself (GH#385).
Bug in dask_ml.model_selection.ShuffleSplit returning the same split when the random_state was set (GH#380).

Version 0.10.0¶

Enhancements¶

Added support for dask.dataframe.DataFrame to dask_ml.model_selection.train_test_split() (GH#351)

Version 0.9.0¶

Enhancements¶

Added dask_ml.model_selection.ShuffleSplit (GH#340)

Bug Fixes¶

Fixed handling of errors in the predict and score steps of dask_ml.model_selection.GridSearchCV and dask_ml.model_selection.RandomizedSearchCV (GH#339)
Compatability with Dask 0.18 for dask_ml.preprocessing.LabelEncoder (you’ll also notice improved performance) (GH#336).

Documentation Updates¶

Added a Dask-ML Roadmap. Please open an issue if you’d like something to be included on the roadmap. (GH#322)
Added many Examples to the documentation and the dask examples binder.

Build Changes¶

We’re now using Numba for performance-sensitive parts of Dask-ML. Dask-ML is now a pure-python project, so we can provide universal wheels.

Version 0.8.0¶

Enhancements¶

Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (GH#200)
Added the dask_ml.metrics.log_loss() loss function and neg_log_loss scorer (GH#318)
Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (GH#320)

Bug Fixes¶

Fixed dtype in LabelEncoder.fit_transform() to be integer, rather than the dtype of the classes for dask arrays (GH#311)

Version 0.7.0¶

Enhancements¶

Added sample_weight support for dask_ml.metrics.accuracy_score(). (GH#217)
Improved performance of training on dask_ml.cluster.SpectralClustering (GH#152)
Added dask_ml.preprocessing.LabelEncoder. (GH#226)
Fixed issue in model_selection meta-estimators not respecting the default Dask scheduler (GH#260)

API Breaking Changes¶

Removed the basis_inds_ attribute from dask_ml.cluster.SpectralClustering as its no longer used (GH#152)
Change dask_ml.wrappers.Incremental.fit() to clone the underlying estimator before training (GH#258). This induces a few changes
1. The underlying estimator no longer gives access to learned attributes like coef_. We recommend using Incremental.coef_.
2. State no longer leaks between successive fit calls. Note that Incremental.partial_fit() is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you’re making multiple passes over the training data.
Changed get_params and set_params for dask_ml.wrappers.Incremental to no longer magically get / set parameters for the underlying estimator (GH#258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:
```
inc.set_params('estimator__alpha': 10)
```

Reorganization¶

Dask-SearchCV is now being developed in the dask/dask-ml repository. Users who previously installed dask-searchcv should now just install dask-ml.

Bug Fixes¶

Fixed random seed generation on 32-bit platforms (GH#230)

Version 0.6.0¶

API Breaking Changes¶

Removed the get keyword from the incremental learner fit methods. (GH#187)
Deprecated the various Partial* estimators in favor of the dask_ml.wrappers.Incremental meta-estimator (GH#190)

Enhancements¶

Added a new meta-estimator dask_ml.wrappers.Incremental for wrapping any estimator with a partial_fit method. See Incremental Meta-estimator for more. (GH#190)
Added an R2-score metric dask_ml.metrics.r2_score().

Version 0.5.0¶

API Breaking Changes¶

The n_samples_seen_ attribute on dask_ml.preprocessing.StandardScalar is now consistently numpy.nan (GH#157).
Changed the algorithm for dask_ml.datasets.make_blobs(), dask_ml.datasets.make_regression() and dask_ml.datasets.make_classfication() to reduce the single-machine peak memory usage (GH#67)

Enhancements¶

Added dask_ml.model_selection.train_test_split() and dask_ml.model_selection.ShuffleSplit (GH#172)
Added dask_ml.metrics.classification_score(), dask_ml.metrics.mean_absolute_error(), and dask_ml.metrics.mean_squared_error().

Bug Fixes¶

dask_ml.preprocessing.StandardScalar now works on DataFrame inputs (GH#157).

Version 0.4.1¶

This release added several new estimators.

Enhancements¶

Added `dask_ml.preprocessing.RobustScaler`¶

Scale features using statistics that are robust to outliers. This mirrors sklearn.preprocessing.RobustScalar (GH#62).

Added `dask_ml.preprocessing.OrdinalEncoder`¶

Encodes categorical features as ordinal, in one ordered feature (GH#119).

Added `dask_ml.wrappers.ParallelPostFit`¶

A meta-estimator for fitting with any scikit-learn estimator, but post-processing (predict, transform, etc.) in parallel on dask arrays. See Parallel Meta-estimators for more (GH#132).

Version 0.4.0¶

API Changes¶

Changed the arguments of the dask-glm based estimators in dask_glm.linear_model to match scikit-learn’s API (GH#94).
- To specify lambuh use C = 1.0 / lambduh (the default of 1.0 is unchanged)
- The rho, over_relax, abstol and reltol arguments have been removed. Provide them in solver_kwargs instead.
This affects the LinearRegression, LogisticRegression and PoissonRegression estimators.

Enhancements¶

Accept dask.dataframe for dask-glm based estimators (GH#84).

Version 0.3.2¶

Enhancements¶

Added dask_ml.preprocessing.TruncatedSVD() and dask_ml.preprocessing.PCA() (GH#78)

Version 0.3.0¶

Enhancements¶

Added KMeans.predict() (GH#83)

API Changes¶

Changed the fitted attributes on MinMaxScaler and StandardScaler to be concrete NumPy or pandas objects, rather than persisted dask objects (GH#75).

Keras and Tensorflow

Contributing

Changelog

Contents

Changelog¶

Version 2025.1.0¶

Version 2024.4.1¶

Version 2023.3.24¶

Version 2022.5.27¶

Version 2021.11.30¶

Version 2021.11.16¶

Version 2021.10.17¶

Version 1.9.0¶

Version 1.8.0¶

Version 1.7.0¶

Version 1.6.0¶

Version 1.5.0¶

Version 1.4.0¶

Version 1.3.0¶

Version 1.2.0¶

Version 1.1.1¶

Version 1.1.0¶

Version 1.0.0¶

Version 0.13.0¶

Version 0.12.0¶

API Breaking Changes¶

Version 0.11.0¶

Enhancements¶

Bug Fixes¶

Version 0.10.0¶

Enhancements¶

Version 0.9.0¶

Enhancements¶

Bug Fixes¶

Documentation Updates¶

Build Changes¶

Version 0.8.0¶

Enhancements¶

Bug Fixes¶

Version 0.7.0¶

Enhancements¶

API Breaking Changes¶

Reorganization¶

Bug Fixes¶

Version 0.6.0¶

API Breaking Changes¶

Enhancements¶

Version 0.5.0¶

API Breaking Changes¶

Enhancements¶

Bug Fixes¶

Version 0.4.1¶

Enhancements¶

Added dask_ml.preprocessing.RobustScaler¶

Added dask_ml.preprocessing.OrdinalEncoder¶

Added dask_ml.wrappers.ParallelPostFit¶

Version 0.4.0¶

API Changes¶

Enhancements¶

Version 0.3.2¶

Enhancements¶

Version 0.3.0¶

Enhancements¶

API Changes¶

Added `dask_ml.preprocessing.RobustScaler`¶

Added `dask_ml.preprocessing.OrdinalEncoder`¶

Added `dask_ml.wrappers.ParallelPostFit`¶